Sky Qiu
  • Home
  • Research
  • Talks
  • Experience
  • TA
  • TMLE Papers

On this page

  • Yearly Papers Trend
  • Contributor Ranking
  • Papers

TMLE Papers

Auto-updated list of TMLE-related papers tracked from Semantic Scholar queries.

Note: this list is not complete and may include irrelevant articles; it will be refined over time.

Last refreshed: 2026-02-21T18:13:19Z

Keywords queried: tmle, targeted maximum likelihood estimation, targeted minimum loss based estimation, super learner, super learning, highly adaptive lasso

Total tracked papers: 916

Yearly Papers Trend

Number of papers per year from 2006 to 2025 (coverage: 884/916 papers with year metadata in range).

0 32 64 96 128 157 2006: 1 papers 2007: 1 papers 2008: 1 papers 2009: 9 papers 2010: 6 papers 2011: 16 papers 2012: 14 papers 2013: 11 papers 2014: 10 papers 2015: 11 papers 2016: 18 papers 2017: 33 papers 2018: 56 papers 2019: 49 papers 2020: 59 papers 2021: 95 papers 2022: 96 papers 2023: 106 papers 2024: 135 papers 2025: 157 papers 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 Papers / year
Tip: hover points for exact counts.

Contributor Ranking

Bars show contributors with at least 2 papers, sorted left to right by paper count. Total shown: 741.

172
Mark van der Laan (the godfather himself)
45
Maya L. Petersen
35
Laura B. Balzer
31
Susan Gruber
30
Alan E. Hubbard
21
David C. Benkeser
17
M. Kamya
17
Mireille E. Schnitzer
16
Antoine Chambaz
15
Iván Díaz
15
Rachael V. Phillips
14
R. Pirracchio
13
Diane V. Havlir
13
G. Chamie
13
Jeremy R Coyle
12
Ivana Malenica
12
Jane Kabami
12
Joshua Schwab
12
N. Hejazi
11
H. Rytgaard
11
J. Ayieko
11
Kara E. Rudolph
11
Oleg Sofrygin
11
Robert W Platt
11
Thomas Gerds
10
Andrew N. Mertens
10
Michael Schomaker
10
Sherri Rose
9
Edwin D. Charlebois
9
J. Colford
9
Miguel Angel Luque Fernandez
9
Mohammad Ali Mansournia
9
S. Lendle
8
Cheng Ju
8
Christian Torp-Pedersen
8
E. Kakande
8
Junming Shi
8
Romain S. Neugebauer
8
Zachary Butzin-Dozier
7
Alejandro Schuler
7
Ashley I. Naimi
7
David Y T Chen
7
E. Polley
7
Joshua R Nugent
7
Lina M. Montoya
7
Noémi Kreif
7
Peter B. Gilbert
7
Wenjing Zheng
7
Yunwen Ji
6
A. Owaraganise
6
Aurélien F. Bibaut
6
Brian D. Williamson
6
C. S. Camlin
6
K. Shiba
6
Katsunori Kondo
6
L. A. Celi
6
M. Carone
6
Michael Rosenblum
6
Mohammad Ehsanul Karim
6
Nicholas T Williams
6
Weixin Cai
5
Alan S. Go
5
Alexander Luedtke
5
Arthur Chatton
5
David McCoy
5
E. Bukusi
5
Edward H. Kennedy
5
Jennifer Ahern
5
João Matos
5
Katherine L. Hoffman
5
Krzysztof Mnich
5
Lisa M. Bodnar
5
Margarita Moreno-Betancur
5
Nahid Sultana
5
Rena C. Patel
5
Sky Qiu
5
Stephen P Luby
5
Tristan Struja
5
W. Rudnicki
5
Wenxin Zhang
4
A. Lin
4
Amir Almasi-Hashiani
4
Andrew Wilson
4
B. Mukherjee
4
Benjamin F. Arnold
4
Carina Marquez
4
Catherine A Koss
4
D. Bangsberg
4
D. Kwarisiima
4
Dana E. Goin
4
E. Moodie
4
F. Mwangwa
4
Geoffrey Ecoto
4
Hana Lee
4
Haodong Li
4
Herbert P Susmann
4
Hossein Mozafar Saadati
4
Ira B. Wilson
4
J. Grembi
4
Jennifer L. Skeem
4
Jonathan Levy
4
Judith Chung
4
Jun Aida
4
Kajsa Kvist
4
Lu Zhang
4
Lucia C. Petito
4
Mahbubur Rahman
4
Marcus Lind
4
Masahiro Kato
4
Milena Gianfrancesco
4
N. Sang
4
P. Chaffee
4
R. Morello-Frosch
4
Rachel Abbott
4
Ronald Herrera
4
S. Zakir Hossain
4
Seyed Saeed Hashemi Nazari
4
T. Zewotir
4
V. Sarovar
4
Yi Li
4
Yohann Foucher
3
A. Benedetti
3
A. Boulle
3
A. Ferrara
3
A. Golinska
3
A. Khamseh
3
A. Krych
3
A. Law
3
A. Polewko-Klim
3
A. Walkey
3
Adel Daoud
3
Anders Munch
3
Arman Alam Siddique
3
Awa Diop
3
Ayoosh Pareek
3
B. Fireman
3
Bagus Sartono
3
C. Cohen
3
C. P. Ponting
3
Camille Maringe
3
Catherine Lee
3
Chansoo Kim
3
Chenchen Yu
3
Chi Zhang
3
Craig A. Magaret
3
D. Menzies
3
D. North
3
D. Rizopoulos
3
David M. Weinstock
3
David S. Morris
3
Denis Talbot
3
E. LeDell
3
E. Schenck
3
Edward L Briercheck
3
Elizabeth T. Rogawski McQuade
3
Eric Hurwitz
3
F. Xue
3
Fabiola Valvert
3
Georgia D. Tomaras
3
Gihan M. Ali
3
I. Kawachi
3
Ian R. White
3
J. Alderden
3
J. Brooks
3
J. Buse
3
J. Franklin
3
J. Onnela
3
J. Tarp
3
J. Wolfson
3
J.E. Haberer
3
Jacqueline M. Torres
3
Jay P Graham
3
John B. Carlin
3
Justin Manjourides
3
K. Himes
3
K. K. Sørensen
3
K. Radon
3
K. Stevenson
3
L. Amusa
3
Laan
3
Liangyuan Hu
3
M. Andersen
3
M. Davies
3
M. Kartal
3
M. Nazemipour
3
M. Puligandla
3
M. Resche-Rigon
3
Marcos Mauricio Siliézar Tala
3
Marilyn Nyabuti
3
Matthew D. Hickey
3
Matthew J Smith
3
Md Faisal Kabir
3
Michael G Nash
3
Monika A. Izano
3
Mucunguzi Atukunda
3
Murali Krishna Pasupuleti
3
N. Bosch
3
N. Sampson
3
Ori Stitelman
3
Oscar Silva
3
P. Yazdanfard
3
R. K. Martin
3
R. Wyss
3
Robert Terbrueggen
3
Ronald C Kessler
3
S. Beentjes
3
S. Ellis
3
S. Sabour
3
S. Schneeweiss
3
S. Wastvedt
3
Sainath Patil
3
Samuel L. Dixon
3
Sanjay Patel
3
Sarang Deshpande
3
Serpil Kılıç Depren
3
Sohail Nizam
3
Stephanie Chisolm
3
Steve Ferreira Guerra
3
T. Abrahamsen
3
T. Clark
3
Thomas Carpenito
3
Timothy Guyon
3
U. Pedersen-Bjergaard
3
W. Grobman
3
W. Lesiński
3
Y. Matsuyama
3
Y. Mehrabi
3
Y. Natkunam
3
Ya-Hui Yu
3
Yan Liu
3
Yining Lu
2
A. Anzalone
2
A. Bener
2
A. Boyd
2
A. Delong
2
A. Elduma
2
A. Ercumen
2
A. F. Dadi
2
A. Fenstad
2
A. Gadzinski
2
A. Hamy
2
A. Hebestreit
2
A. Juul
2
A. K. Waschka
2
A. Kahkoska
2
A. Kharsany
2
A. Kheir
2
A. L. Wong
2
A. Latouche
2
A. MacMahon
2
A. Mira
2
A. Moltó
2
A. Møller
2
A. Nguyen
2
A. Persson
2
A. Rahimi Foroushani
2
A. Rivera
2
A. Shoab
2
A. V. Torreira
2
A. Yang
2
A. Yazawa
2
A. Zonderman
2
A. Çalık
2
Abigail R Cartus
2
Adriano Dias
2
Ahmed M. Mansour
2
Aibin Qu
2
Ajit Govind
2
Alaa O. Khadidos
2
Alastair Bennett
2
Alejandra Benitez
2
Alejandro Rodríguez
2
Alex Sankin
2
Alirio Bastidas
2
Alyce S Adams
2
Amit Grover
2
Andrea Manca
2
André Fonseca
2
Angela B. Smith
2
Angelo Tortora
2
Anita Natalia Varga
2
Anne Ruhweza Katahoire
2
Antonio Paredes
2
Arusha Patil
2
Ashish M. Kamat
2
Asma Elsony
2
B. Grandal
2
B. Taiwo
2
B. Zareini
2
Bin Hu
2
Bingxiao Li
2
Bopha Chrea
2
Brian R. Lane
2
Brock B O'Neil
2
Bruce A. Levy
2
Bryan A. Comstock
2
C. Achenbach
2
C. Börnhorst
2
C. Chute
2
C. Cordeiro
2
C. Howe
2
C. M. Sauer
2
C. Moccia
2
C. Musahl
2
C. Pizzi
2
C. Rongieres
2
C. Roux
2
C. Selmer
2
C. Sirois
2
C. T. Ekstrøm
2
Carlos Garc'ia Meixide
2
Carlos Ruíz-Frutos
2
Carol E. Golin
2
Carolyn McCloskey
2
Chad R Ritch
2
Charles C. Peyton
2
Charles R. Newton
2
Chi-Shin Wu
2
Chia-Ming Chang
2
Chonghao Wang
2
Chris J. Kennedy
2
Christine P. Stewart
2
Christopher L. Camp
2
Chuandi Jin
2
Chuizheng Meng
2
Cosmas Zyambo
2
Cristian C Serrano-Mayorga
2
D. Azrael
2
D. Barouch
2
D. Black
2
D. Jahed Armaghani
2
D. Moukheiber
2
D.A. Regier
2
Damazo T. Kadengye
2
Daniel Mtai Mwanga
2
David Etoori
2
David J. Graham
2
Deirdre Weymann
2
Di Zhang
2
Diarmuid Grimes
2
Dimitra Karagkouni
2
Diogo M. F. Mattos
2
Doug B MacLean
2
E. Eisen
2
E. Geng
2
E. Houpt
2
E. Oken
2
E. Solorzano
2
E. Stuart
2
Efstathios D. Gennatas
2
Eliza C. Miller
2
Elizabeth Arinitwe
2
Elsa D. Ibáñez-Prada
2
Emanuel Christ
2
Emanuel Krebs
2
Emilie Højbjerre-Frandsen
2
Emmanuel Ruhamyankaka
2
Erick M Wafula
2
Erick M. Marigi
2
Erika M Wolff
2
Eugene K. Lee
2
Eva-Maria Wild
2
F. Folke
2
F. Gnesin
2
F. Le Borgne
2
F. Reyal
2
F. Tanser
2
F. W. Haug
2
F. Wen
2
Fahimeh Hadavimoghaddam
2
Faith Kagoya
2
Fan Li
2
Francisco López
2
Frank Eriksson
2
Fredrick J Opel
2
G. C. Alexander
2
G. Moatshe
2
G. Moirano
2
G. Schmajuk
2
G. Schubert
2
G. Sotgiu
2
Gabriel J. Escobar
2
Gabriella Barratt Heitmann
2
Geeta Reddy
2
Gilmer Valdes
2
Guanbo Wang
2
Guillaume Barbalat
2
Guiqian 贵乾 Sun 孙
2
H. Bonneau-Chloup
2
H. Christensen
2
H. D'Couto
2
H. Hikichi
2
H. Prozesky
2
H. Tsai
2
H. Visnes
2
H. Yonis
2
Hai Zhu
2
Hamdan Mustafa Hamdan Ali
2
Hao Sun
2
Haolin Li
2
Heather L. Cook
2
Helen Bell-Gorrod
2
Helio C. Neto
2
Helio Rubens Nunes
2
Hemalkumar B Mehta
2
Henry M. Blumberg
2
Hind A Beydoun
2
Honghu Liu
2
Howard Liu
2
Hui Wang
2
I. Martín-Loeches
2
I.C. Williams
2
Ismaïl Ahmed
2
Iv'an D'iaz
2
J. C. Bassett
2
J. Cappelleri
2
J. Day
2
J. Ellen
2
J. Erlen
2
J. Frey
2
J. Hogan
2
J. Kukreja
2
J. Lynch
2
J. Platts-Mills
2
J. R. Lacalle-Remigio
2
J. S. Ohlendorff
2
J. Schmittdiel
2
J. Simoni
2
J. Simpson
2
J. Smoller
2
J. Vanderpuye-Orgle
2
J. Yazdany
2
JM van Dongen
2
Jacek Skarbinski
2
Jaffer Okiring
2
James Peng
2
Janice Litunya
2
Janna L. Williams
2
Jason Johnson-Peretz
2
Jason Poulos
2
Jeffrey S. Montgomery
2
Jeffrey W. Nix
2
Jenna Wong
2
Jenney R. Lee
2
Jennifer M. Taylor
2
Ji-su Park
2
Jianwen Cai
2
Jiayi Ji
2
Jie Liu
2
Jie Zhu
2
Jin Jin
2
Jinma Ren
2
Joan A. Casey
2
Joanita Nangendo
2
John Bosco Tamu Munezeo
2
John L. Gore
2
John P. Mickley
2
Jon Steingrimsson
2
Jonathan L. Wright
2
Josep Gómez
2
João Marcos Bernardes
2
Juan David Gutiérrez
2
Juan Gómez-Salgado
2
Juanran Feng
2
Judith E. Bosmans
2
Jue Lin
2
Julia H. Arnsten
2
Junjie Shen
2
Jürgen Beck
2
K. Chamie
2
K. Filion
2
K. Holakouie-Naieni
2
K. Hsu
2
K. Kragholm
2
K. Lee
2
K. Moore
2
K. Nepple
2
K. Okoroha
2
K. Scarpato
2
K. Schmidt
2
K. Seaton
2
K. Soerensen
2
K. Tanner
2
Kaiwen Hou
2
Kamal S. Pohar
2
Kamaldeep Joshi
2
Kareme D. Alder
2
Karla Diaz-Ordaz
2
Karla DiazOrdaz
2
Katherine J. Lee
2
Kathryn E. Stephenson
2
Kathy Goggin
2
Katya Zelevinsky
2
Kenneth H. Mayer
2
Kirsten E. Landsiedel
2
Klara R. Klein
2
Kook-Hwan Oh
2
Kristin C. Caolo
2
Kristin E. Porter
2
Kristin M Follmer
2
Kyra Gan
2
L. Blais
2
L. Cushing
2
L. Degenhardt
2
L. F. Reyes
2
L. Gama
2
L. Macleod
2
L. Mariani
2
L. Richiardi
2
L. Trupin
2
L. Unicomb
2
L. Weishaupt
2
Lama Moukheiber
2
Lara Lewis
2
Larry G Kessler
2
Lars van der Laan
2
Laura C Myers
2
Lauren D. Liao
2
Lauren Eyler Dang
2
Lawrence Corey
2
Lily Zhang
2
Lin Ma
2
Ling Zhang
2
Linh Tran
2
Lu Wang
2
Lukas Andereggen
2
M. A. Adam
2
M. Beydoun
2
M. Brooks
2
M. Chandra
2
M. Dellenbach
2
M. Dibonaventura
2
M. Dougados
2
M. Drakos
2
M. Eberg
2
M. Fox
2
M. Glymour
2
M. Goma
2
M. Hassanzadeh
2
M. Hindborg
2
M. Hivert
2
M. Ho
2
M. Hudgens
2
M. Léger
2
M. M. Luedi
2
M. McElrath
2
M. Mossanen
2
M. Moukheiber
2
M. Petukhova
2
M. Popović
2
M. Rahman
2
M. Reilly
2
M. Sobieszczyk
2
M. Suh
2
M. Taniuchi
2
M. Woo Kinshella
2
Magdalena Cerdá
2
Mara A McAdams-DeMarco
2
Marc Rosen
2
Marcela Horvitz-Lennon
2
Marie Riviere
2
Mario H. Vargas
2
Mario Hevesi
2
Mark D Tyson
2
Markus Huber
2
Martin Blomberg Jensen
2
Mary E. Westerman
2
Matthew Miller
2
Matthieu Legrand
2
Max Kates
2
Md Ohedul Islam
2
Md. Saheen Hossen
2
Melissa A. Haendel
2
Melissa Spröesser Alonso
2
Menglan Pang
2
Michael A Horberg
2
Michael Ayebare
2
Milena Maule
2
Mohammad Alauddin
2
Momenul Haque Mondol
2
N. C. Fernandes
2
N. Gandhi
2
N. Hamzehpour
2
N. Yates
2
Nadir Sella
2
Nancy Reynolds
2
Neal D. Shore
2
Nebal S. Abu Hussein
2
Nelly López
2
Nicholas P. Jewell
2
Nuno Sepúlveda
2
O. Hyrien
2
Olivier Labayle
2
Omar Al-Heeti
2
Opeoluwa Owoyele
2
P. Bradshaw
2
P. Cislo
2
P. Clare
2
P. Heagerty
2
P. Hlavacek
2
P. Mutsuddi
2
P. Neuvial
2
P. Pal
2
P. Pradhan
2
P. Zhu 朱
2
P. Zivich
2
Pandi Li
2
Parth K. Modi
2
Peter O. Otieno
2
Peter Ssebutinde
2
Piero Fariselli
2
Pietro Ravani
2
Pratik Choudhary
2
Qiang 强 Tao 陶
2
R. Daniel
2
R. Garofoli
2
R. Grieve
2
R. Kantor
2
R. Koup
2
R. Lenain
2
R. Machekano
2
R. Pratley
2
R. Remien
2
R. Wong
2
R. Wood
2
Rajesh Bansode
2
Rana Miah
2
Rian J. Dickstein
2
Richard Liu
2
Richard Moffitt
2
Rishabh Jain
2
Robert Gross
2
Ruhollah Taghizadeh‐Mehrjardi
2
Ruth H. Keogh
2
S. Akther
2
S. C. Verma
2
S. Daneshmand
2
S. Jafarzadeh
2
S. Karuna
2
S. Krikov
2
S. L. Famida
2
S. Lanzinger
2
S. Liao
2
S. Nedjat
2
S. Park
2
S. Purushotham
2
S. Razzak
2
S. Shade
2
S. Woldu
2
S. Yahyavi
2
Saad M. Darwish
2
Safoora Gharibzadeh
2
Sam S. Chang
2
Sandesh Risal
2
Sarah Forrest
2
Sarah M Gildea
2
Sarah T. Alauddin
2
Scott M. Gilbert
2
Seth A. Berkowitz
2
Seunghye Lee
2
Shahab Hosseini
2
Shahjahan Ali
2
Shaker M Eid
2
Shaoli Zhou
2
Sharon-Lise T. Normand
2
Sheena Christabel Pravin
2
Sheetal S Sawant
2
Shu-Sen Chang
2
Shuo Wang
2
Simone A. Ludwig
2
Sixing Chen
2
Srilatha Edupuganti
2
Stefanie Do
2
Stefanie Schmid
2
Stella Kabageni
2
Stephen R. Walsh
2
Steven Young
2
Stijn Vansteelandt
2
Sulaiman Moukheiber
2
Sung Min Kim
2
Sunita Dhingra
2
Séamus Lankford
2
T. Abdou
2
T. Athni
2
T. Bivalacqua
2
T. Feike
2
T. Holtz
2
T. Odeny
2
T. Scholten
2
T. Schuster
2
T. VanderWeele
2
T. Vo
2
T. Woodruff
2
Thomas J. Guzzo
2
Til Stürmer
2
Tracey L. Yap
2
Trung-Kien Nguyen
2
Tullika Garg
2
U. Cooray
2
U. Pata
2
V. Jain
2
V. Ress
2
Valence Mfitumukiza
2
Vanessa Didelez
2
Victor De Gruttola
2
Vincent X. Liu
2
Viraj A Master
2
W. Ahrens
2
W. Hao
2
W. Ng’ambi
2
Wei Zhang
2
Wenbo Wu
2
Wookyung Chung
2
Xiaoru Sun
2
Xichao Wang
2
Xin Wang
2
Xin Zhou
2
Y. Crider
2
Y. Interian
2
Y. Zhang
2
Y. Zhao
2
Yang Liu
2
Yaning Feng
2
Yeji Kim
2
Yihao Liu
2
Ying Huang
2
Yiting Li
2
Yixin Fang
2
Yongfei Dong
2
Youmi Suk
2
Yu-jeong Song
2
Yun Jung Oh
2
Yunda Huang
2
Zaixiang Tang
2
Ze-Yu Wang
2
Zheng Liu
2
Zhengping Che
2
Ziqing Hei
2
Ziyue Wu
2
Ângela Jornada Ben
2
Özer Depren

Tip: scroll horizontally and hover each bar to see exact paper counts.

Papers

Papers are grouped by publication year. Expand a year to browse papers. Only papers from 2006+ are shown (plus records with unknown year).

2026 (21 papers)
2026-02-19 — genriesz: A Python Package for Automatic Debiased Machine Learning with Generalized Riesz Regression

Authors: Masahiro Kato
Year: 2026
Publication Date: 2026-02-19
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Efficient estimation of causal and structural parameters can be automated using the Riesz representation theorem and debiased machine learning (DML). We present genriesz, an open-source Python package that implements automatic DML and generalized Riesz regression, a unified framework for estimating Riesz representers by minimizing empirical Bregman divergences. This framework includes covariate balancing, nearest-neighbor matching, calibrated estimation, and density ratio estimation as special cases. A key design principle of the package is automatic regressor balancing (ARB): given a Bregman generator $g$ and a representer model class, genriesz} automatically constructs a compatible link function so that the generalized Riesz regression estimator satisfies balancing (moment-matching) optimality conditions in a user-chosen basis. The package provides a modulr interface for specifying (i) the target linear functional via a black-box evaluation oracle, (ii) the representer model via basis functions (polynomial, RKHS approximations, random forest leaf encodings, neural embeddings, and a nearest-neighbor catchment basis), and (iii) the Bregman generator, with optional user-supplied derivatives. It returns regression adjustment (RA), Riesz weighting (RW), augmented Riesz weighting (ARW), and TMLE-style estimators with cross-fitting, confidence intervals, and $p$-values. We highlight representative workflows for estimation problems such as the average treatment effect (ATE), ATE on treated (ATT), and average marginal effect estimation. The Python package is available at https://github.com/MasaKat0/genriesz and on PyPI.

2026-02-18 — HAL-MLE Log-Splines Density Estimation (Part I: Univariate)

Authors: Yilong Hou, Zhengpu Zhao, Yi Li, M. V. D. Laan
Year: 2026
Publication Date: 2026-02-18
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso

Abstract:
We study nonparametric maximum likelihood estimation of probability densities under a total variation (TV) type penalty, sectional variation norm (also named as Hardy-Krause variation). TV regularization has a long history in regression and density estimation, including results on $L^2$ and KL divergence convergence rates. Here, we revisit this task using the Highly Adaptive Lasso (HAL) framework. We formulate a HAL-based maximum likelihood estimator (HAL-MLE) using the log-spline link function from \citet{kooperberg1992logspline}, and show that in the univariate setting the bounded sectional variation norm assumption underlying HAL coincides with the classical bounded TV assumption. This equivalence directly connects HAL-MLE to existing TV-penalized approaches such as local adaptive splines \citep{mammen1997locally}. We establish three new theoretical results: (i) the univariate HAL-MLE is asymptotically linear, (ii) it admits pointwise asymptotic normality, and (iii) it achieves uniform convergence at rate $n^{-(k+1)/(2k+3)}$ up to logarithmic factors for the smoothness order $k \geq 1$. These results extend existing results from \citet{van2017uniform}, which previously guaranteed only uniform consistency without rates when $k=0$. We will include the uniform convergence for general dimension $d$ in the follow-up work of this paper. The intention of this paper is to provide a unified framework for the TV-penalized density estimation methods, and to connect the HAL-MLE to the existing TV-penalized methods in the univariate case, despite that the general HAL-MLE is defined for multivariate cases.

2026-02-12 — Deep Doubly Debiased Longitudinal Effect Estimation with ICE G-Computation

Authors: Wenxin Chen, Weishen Pan, Kyra Gan, Fei Wang
Year: 2026
Publication Date: 2026-02-12
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Estimating longitudinal treatment effects is essential for sequential decision-making but is challenging due to treatment-confounder feedback. While Iterative Conditional Expectation (ICE) G-computation offers a principled approach, its recursive structure suffers from error propagation, corrupting the learned outcome regression models. We propose D3-Net, a framework that mitigates error propagation in ICE training and then applies a robust final correction. First, to interrupt error propagation during learning, we train the ICE sequence using Sequential Doubly Robust (SDR) pseudo-outcomes, which provide bias-corrected targets for each regression. Second, we employ a multi-task Transformer with a covariate simulator head for auxiliary supervision, regularizing representations against corruption by noisy pseudo-outcomes, and a target network to stabilize training dynamics. For the final estimate, we discard the SDR correction and instead use the uncorrected nuisance models to perform Longitudinal Targeted Minimum Loss-Based Estimation (LTMLE) on the original outcomes. This second-stage, targeted debiasing ensures robustness and optimal finite-sample properties. Comprehensive experiments demonstrate that our model, D3-Net, robustly reduces bias and variance across different horizons, counterfactuals, and time-varying confoundings, compared to existing state-of-the-art ICE-based estimators.

2026-02-11 — Highly Adaptive Principal Component Regression

Authors: Mingxun Wang, Alejandro Schuler, M. V. D. Laan, Carlos Garc'ia Meixide
Year: 2026
Publication Date: 2026-02-11
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso

Abstract:
The Highly Adaptive Lasso (HAL) is a nonparametric regression method that achieves almost dimension-free convergence rates under minimal smoothness assumptions, but its implementation can be computationally prohibitive in high dimensions due to the large basis matrix it requires. The Highly Adaptive Ridge (HAR) has been proposed as a scalable alternative. Building on both procedures, we introduce the Principal Component based Highly Adaptive Lasso (PCHAL) and Principal Component based Highly Adaptive Ridge (PCHAR). These estimators constitute an outcome-blind dimension reduction which offer substantial gains in computational efficiency and match the empirical performances of HAL and HAR. We also uncover a striking spectral link between the leading principal components of the HAL/HAR Gram operator and a discrete sinusoidal basis, revealing an explicit Fourier-type structure underlying the PC truncation.

2026-02-10 — Prescribing of medication to prevent glucocorticoid harms in patients with Polymyalgia Rheumatica: a cross-sectional study and two emulated target trials in the Clinical Practice Research Datalink Aurum.

Authors: H. Twohig, David Jenkinson, J. Bailey, S. Hider, Ian C. Scott, Sara Muller
Year: 2026
Publication Date: 2026-02-10
Venue: Arthritis & Rheumatology
DOI: 10.1002/art.70087
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
OBJECTIVES Polymyalgia rheumatica (PMR) is a common indication for long-term glucocorticoid (GC) treatment. Bone- and gastro-protective medications are recommended for those at high-risk of adverse events from GCs but no trials have evaluated their effectiveness in PMR. We describe bone-/gastro-protective medicine prescribing in people with PMR and evaluate its impact on adverse GC outcomes using a target trial approach. METHODS A sample of >40,000 individuals aged ≥50 years, with a coded PMR diagnosis from January 2010-March 2022, prescribed GCs within 21 days of first PMR diagnosis code, was constructed in CPRD Aurum. Prescriptions were defined as prevalent (pre-PMR diagnosis), incident (at diagnosis), or late (post-diagnosis, still GC-treated), reported stratified by age/gender/deprivation. A target-trial approach assessed the effect of: a) bisphosphonates on fragility fractures and b) proton-pump inhibitors/H2-receptor antagonists (PPIs/H2RAs) on gastrointestinal (GI) ulceration/bleeding. Treatment effect, adjusted for confounders, was modelled using targeted maximum likelihood estimation. RESULTS 67.2% were co-prescribed bisphosphonates and 78.6% PPIs/H2RAs. Males and those in more deprived areas were less likely to receive bisphosphonates. 1.40% (95%CI 1.10%,1.70%) of those prescribed vs 2.32% (2.12%,2.52%) of those not prescribed bisphosphonates for 12 months experienced a fracture (risk difference 0.92% points [0.56%,1.27%], NNT 109). Prescribing gastro-protective medications was not associated with serious GI events. CONCLUSION Rates of prescribing to mitigate GC harms are higher than previously reported. Bisphosphonates are associated with approximately one less fragility fracture per year for every 100 people treated. Gastro-prophylaxis is not associated with reduced risk of GI ulceration/bleeding, suggesting potential to reduce prescribing for this indication.

2026-02-09 — Intravenous amino acid supplementation reduces 28-day mortality in sepsis: a retrospective cohort study from MIMIC-IV database and Mendelian randomization analysis.

Authors: Qinxue Wang, Yuanze Ma, Yuhan Zhao, Jiawei Wang, Yi Han, Haobing Huang
Year: 2026
Publication Date: 2026-02-09
Venue: British Journal of Nutrition
DOI: 10.1017/S0007114526106461
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Sepsis-related deaths remain prevalent in intensive care settings, with metabolic dysregulation as a key contributor. Although amino acid supplementation has shown promise, its clinical effectiveness in sepsis is unclear. This study evaluated the impact of intravenous amino acid administration on 28-day mortality in intensive care unit (ICU) sepsis patients using retrospective cohort analysis and Mendelian randomization (MR). We analyzed data from the MIMIC-IV database, matching 726 patients (363 per group) using propensity scores. The association between amino acid supplementation and mortality was assessed using Logistic regression, Cox regression, and targeted maximum likelihood estimation (TMLE). Two-sample MR was used to explore causal links between 20 common amino acids and sepsis mortality. In the cohort analysis, amino acid supplementation was consistently associated with significantly reduced 28-day mortality across all analytical methods (logistic regression: OR = 0.48, p < 0.01; Cox regression: HR = 0.48, p < 0.01; TMLE: ATE = -0.102, p < 0.01). In contrast, the MR analysis did not find a significant causal association for any single amino acid after correction for multiple comparisons; although glycine showed a nominal protective signal, it did not remain significant after FDR correction. This dual-method study demonstrates a strong association between compound amino acid infusions and reduced mortality in sepsis but did not identify any single amino acid as a robust causal mediator. These findings suggest the benefit may arise from a synergistic effect, highlighting the need for randomized controlled trials to validate these observational results and optimize nutritional strategies.

2026-02-05 — Integrating causal inference and machine learning to quantify climate-malaria relationships: Evidence of temperature and rainfall thresholds from Colombian municipalities

Authors: Juan David Gutiérrez
Year: 2026
Publication Date: 2026-02-05
Venue: PLOS Global Public Health
DOI: 10.1371/journal.pgph.0005925
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Rainfall and temperature are key climate determinants of malaria incidence; yet their causal exposure-response curves on malaria incidence across the entire Colombian territory remain unquantified. We estimated the effects of rainfall and temperature on malaria incidence at the municipal scale from 2007 to 2023. We conducted an ecological observational study in 969 Colombian municipalities located below 1,600 meters. The monthly Standardized Incidence Ratio (SIR) of malaria was calculated for each municipality. Directed acyclic graphs guided the identification of the appropriate adjustments needed to emulate the corresponding experimental design and avoid inducing bias, and the effect was estimated using a modified approach of the Targeted Maximum Likelihood Estimation (TMLE). Exposure-response curves were estimated for two outcomes: the current month and the moving average for the current and previous month. A total of 1,075,112 cases of malaria were reported. The results suggest a non-linear relationship between rainfall and temperature concerning the SIR of malaria, indicating an optimal temperature of 25 °C and approximately 37 mm of rainfall for the highest incidence. The negative control test revealed the presence of residual confounding bias (p < 0.05) in all estimates. Meanwhile, the estimations of the E-value indicated low to moderate tolerance (E-value = 1.14 – 1.48) to an unmeasured confounder. These findings support the integration of rainfall and temperature thresholds into early-warning systems for targeted malaria control.

2026-01-29 — Estimating the causal effect of sugar consumption on dental decay: a longitudinal targeted maximum likelihood estimation study

Authors: P. Santiago, X. Ju, L. Jamieson, H. Elani
Year: 2026
Publication Date: 2026-01-29
Venue: AJE Advances: Research in Epidemiology
DOI: 10.1093/ajeadv/uuag003
Link: Semantic Scholar
Matched Keywords: super learner, targeted maximum likelihood estimation

Abstract:
Dental decay in permanent teeth is the most prevalent disease worldwide, with 54% of young people under the age of 18 having experienced it. Despite these findings, there have been no studies that investigated the causal effects of time-varying exposure to higher sugar consumption throughout childhood on dental decay in late adolescence. We investigated the causal effects of sustained higher sugar consumption, cumulative sugar consumption, and sugar consumption trajectories from ages 4 to 14 on the risk of ever experiencing dental decay at age 16. We used data from the Longitudinal Study of Australian Children, an ongoing national Australian study that started in 2004, with a sample of 4,671 young people. Causal effects were estimated using longitudinal Targeted Maximum Likelihood Estimation combined with the Super Learner ensemble. Young people with sustained higher sugar consumption (ie, above-median sugar consumption at ages 4, 6, 8, 10, 12, and 14) throughout the study period had a 37 percentage point higher risk of dental decay compared to those with no exposure. Each additional exposure to higher sugar consumption (ie, additional above-median sugar consumption at a certain age) between ages 4 and 14 was associated with a 6% increase in the relative risk of dental decay by age 16. This study provides causal evidence linking higher sugar consumption throughout childhood to dental decay in late adolescence.

2026-01-27 — Sodium Correction Rates and Associated Outcomes Among Patients With Severe Hyponatremia : A Retrospective Cohort Study.

Authors: D. G. Mark, Mubarika Alavi, J. Nugent, Mary Reed
Year: 2026
Publication Date: 2026-01-27
Venue: Annals of Internal Medicine
DOI: 10.7326/ANNALS-25-03676
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
BACKGROUND Slow correction of severe hyponatremia is recommended to prevent osmotic demyelination syndrome but is associated with higher mortality. OBJECTIVE To examine the association between sodium correction rates and death or delayed neurologic events. DESIGN Retrospective cohort study. SETTING Twenty-one community hospitals of an integrated health system in northern California. PATIENTS Adults hospitalized with a serum sodium level of 120 mEq/L or lower between 2008 and 2023. INTERVENTION Maximum 24-hour rate of serum sodium correction (slow [<8 mEq/L], medium [8 to 12 mEq/L], or fast [>12 mEq/L; reference]). MEASUREMENTS The primary outcome was a composite of 90-day death or delayed neurologic events (new demyelination, paralysis, epilepsy, or altered consciousness between 3 and 90 days from admission). Standardized risk differences (RDs) were generated using targeted maximum likelihood estimation. Heterogeneity of effect was assessed across grades of predicted risk. RESULTS 13 988 patients were hospitalized with severe hyponatremia during the study period (median age, 74 years; 63% female). Comorbidities included congestive heart failure (24%), liver disease (18%), alcohol dependence (14%), and metastatic cancer (10%). The primary outcome occurred in 3000 patients (21%); 90-day death occurred in 2554 (18%), and 90-day delayed neurologic events occurred in 587 (4%). Compared with slow 24-hour sodium correction, both medium (RD, -5.6 percentage points [95% CI, -7.1 to -4.0 percentage points]) and fast (RD, -9.0 percentage points [CI, -11.1 to -6.9 percentage points]) correction rates were associated with lower adjusted risk for the primary outcome. Risk differences increased with higher predicted risk, whereas risk ratios remained similar. LIMITATIONS Residual confounding; outcome ascertainment using diagnostic codes. CONCLUSION Faster sodium correction is associated with lower risk for 90-day death or delayed neurologic events. Treatment guidelines should be reexamined. PRIMARY FUNDING SOURCE The Permanente Medical Group Rapid Analytics Unit Program.

2026-01-26 — Contribution of Acute Kidney Injury After Liver Transplant in Development of Chronic Kidney Disease: A Single-Center Retrospective Cohort Study.

Authors: Nicholas V. Mendez, Daniel Chan, Ty Thompson, David Chen, Sebastian Zeiner, Rishi P. Kothari, Hillary J Braun, M. Bokoch, Kerstin Kolodzie, Dieter Adelmann
Year: 2026
Publication Date: 2026-01-26
Venue: Anesthesia and Analgesia
DOI: 10.1213/ANE.0000000000007911
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
BACKGROUND Acute kidney injury (AKI) is common after liver transplant and associated with increased morbidity and mortality. Transplantation of nonrenal organs is also associated with eventual chronic kidney disease (CKD). Development of CKD after liver transplant is known to be multifactorial; however, this study evaluates the unique contribution of AKI in this complex disease pathway. METHODS Patients were classified into 2 groups: presence or absence of severe AKI within 72 hours postoperatively. Kidney function was assessed at year 1: normal/mild (estimated glomerular filtration rate [eGFR] ≥60 mL/min/1.73 m2); moderate (30 ≤eGFR <60 mL/min/1.73 m2); or severe (eGFR <30 mL/min/1.73m2) disease. Adjusted relative risks of both CKD and death at years 1 through 3 in the presence versus absence of severe AKI were estimated using discrete-time targeted maximum likelihood estimation. RESULTS Of 1574 patients, 769 (49%) experienced severe AKI. At year 1, 1024 (65%) patients had normal/mild, 487 (31%) had moderate, and 63 (4%) had severe CKD. The unadjusted relative risk of severe CKD was 3.66 (95% confidence interval [CI], 2.15-7.33), and the adjusted relative risk was 2.62 (95% CI, 1.61-4.28) in patients with severe AKI. In total, 66 (4%), 115 (7%), and 147 (9%) patients died in years 1, 2, and 3, respectively. Patients with severe AKI had an unadjusted relative risk of death at year 1 of 2.41 (95% CI, 1.47-4.19) compared to an adjusted relative risk of 1.15 (95% CI, 1.04-1.28); at year 2, the unadjusted relative risk of death was 1.51 (95% CI, 1.07-2.19) compared to an adjusted relative risk of 1.14 (95% CI, 1.04-1.25); and at year 3, the unadjusted relative risk of death was 1.44 (95% CI, 1.05-1.97) compared to an adjusted relative risk of 1.13 (95% CI, 1.04-1.23). CONCLUSION Severe postoperative AKI is associated with an increased risk of severe CKD at 1 year and mortality up to 3 years after liver transplant. Postoperative AKI represents an important target for future perioperative interventions aimed at mitigating the risk of long-term morbidity and mortality for liver transplant patients.

2026-01-22 — Super learner ensemble-based internal quality assessment of watermelon via integration of tapping acoustics and rind texture analysis

Authors: Ketsarin Chawgien, S. Kiattisin
Year: 2026
Publication Date: 2026-01-22
Venue: International journal of advances in soft computing and its applications
DOI: 10.15849/ijasca.v18i1.9
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Watermelon (Citrullus lanatus) is a widely cultivated fruit recognized for its high sugar content. Accurate detection of maturity and soluble solid content (SSC) is essential to ensure optimal harvest timing, sweetness, and market value, as well as to manage resource usage efficiently. This study introduces a low-cost, portable, and non-destructive approach for maturity classification and SSC estimation in Kinnaree watermelon by integrating tapping acoustics and rind texture analysis with ensemble learning algorithms. Tapping-induced acoustic signals were analyzed to extract key resonant features, while rind texture was quantified using image processing techniques. Selected features from both data sources, combined with watermelon mass, were utilized for three-class maturity classification and SSC regression modeling. Machine learning (ML) algorithms were used to map complex and nonlinear relationships between features and watermelon quality attributes. Results demonstrated that acoustic features and fruit mass were critical for maturity classification. Visual features were essential for SSC estimation. Super learner ensemble demonstrates superior predictive accuracy compared to other models, both in classifying ripeness and predicting the SSC of watermelons. Comparative studies with earlier methods confirmed the effectiveness and competitiveness of the proposed technology for non-destructive evaluation of watermelon quality.

2026-01-14 — Use of a veterinary therapeutic renal diet in cats with early chronic kidney disease is associated with slower disease progression and improved survival.

Authors: Michael Coyne, Donald Szlosek, Jenna Webeck, Rhaysa Feliciano, Noel Berger, Jason Doukas, David Denton, Louisa Yu Zhang, Natalee Holt, H. Michael, Allison L. O’Kell, Julia Riggott, Sarah L. Sweet, D. Mccrann
Year: 2026
Publication Date: 2026-01-14
Venue: Journal of the American Veterinary Medical Association
DOI: 10.2460/javma.25.10.0665
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Objective To determine disease progression and survival duration in cats diagnosed with early-stage chronic kidney disease (CKD) continuously treated with a veterinary therapeutic renal diet versus those untreated at diagnosis. Methods This retrospective study utilized a commercial database of medical records from veterinary practices located in Canada and the US. Cats born between January 1, 2010, and December 31, 2014, diagnosed with early-stage CKD were randomly selected. Records were reviewed to determine the date of diagnosis and whether treatment with a therapeutic renal diet was initiated. Progression of CKD and survival duration were evaluated with longitudinal targeted maximum likelihood estimation modeling. Results Of 1,430 cats with early CKD, 839 received a veterinary therapeutic renal diet and 591 did not. Dietary therapy was associated with reduced risk of progression. Treated CKD Stage 1 cats had a 45% lower hazard of progression (hazard ratio [HR], 0.55; 95% CI, 0.52 to 0.58). Treated CKD Stage 2 cats that had creatinine within and above the reference intervals had 46% (HR, 0.54; 95% CI, 0.50 to 0.58) and 41% (HR, 0.59; 95% CI, 0.56 to 0.62) lower hazards of progression, respectively. Cats treated with a therapeutic renal diet had a longer survival over 3 years: restricted mean survival time was 31.0 versus 26.0 months in untreated cats. Conclusions Use of a veterinary therapeutic renal diet in cats with early CKD slows disease progression and improves survival. Clinical Relevance Early diagnosis and intervention with a therapeutic renal diet may optimize long-term outcomes in cats with CKD.

2026-01-13 — Digital Asset Analytics for DeFi Protocol Valuation: An Explainable Optuna-Tuned Super Learner Ensemble Framework

Authors: Gihan M. Ali
Year: 2026
Publication Date: 2026-01-13
Venue: Journal of Risk and Financial Management
DOI: 10.3390/jrfm19010063
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Decentralized Finance (DeFi) has become a major component of digital asset markets, yet accurately valuing protocol performance remains difficult due to high volatility, nonlinear pricing dynamics, and persistent disclosure gaps that amplify valuation risk. This study develops an Optuna-tuned Super Learner stacked ensemble to improve risk-aware DeFi valuation, combining Extremely Randomized Trees (ETs), Support Vector Regression (SVR), and Categorical Boosting (CAT) as heterogeneous base learners, with a K-Nearest Neighbors (KNNs) meta-learner integrating their forecasts. Using an expanding-window panel time-series cross-validation design, the framework achieves significantly higher predictive accuracy than individual models, benchmark ensembles, and econometric baselines, obtaining RMSE = 0.085, MAE = 0.065, and R2 = 0.97—representing a 25–36% reduction in valuation error. Wilcoxon tests confirm that these gains are statistically significant (p < 0.01). SHAP-based interpretability analysis identifies Gross Merchandise Volume (GMV) as the primary valuation determinant, followed by Total Value Locked (TVL) and key protocol design features such as Decentralized Exchange (DEX) classification, while revenue variables and inflation contribute secondary effects. The findings demonstrate how explainable ensemble learning can strengthen valuation accuracy, reduce information-driven uncertainty, and support risk-informed decision-making for investors, analysts, developers, and policymakers operating within rapidly evolving blockchain-based digital asset environments.

2026-01-13 — Comparing training window selection methods for prediction in non-stationary time series.

Authors: Fridtjof Petersen, Jonas M B Haslbeck, J. Tendeiro, Anna M. Langener, M. Kas, D. Rizopoulos, L. Bringmann
Year: 2026
Publication Date: 2026-01-13
Venue: British Journal of Mathematical & Statistical Psychology
DOI: 10.1111/bmsp.70018
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
The widespread adoption of smartphones creates the possibility to passively monitor everyday behaviour via sensors. Sensor data have been linked to moment-to-moment psychological symptoms and mood of individuals and thus could alleviate the burden associated with repeated measurement of symptoms. Additionally, psychological care could be improved by predicting moments of high psychopathology and providing immediate interventions. Current research assumes that the relationship between sensor data and psychological symptoms is constant over time - or changes with a fixed rate: Models are trained on all past data or on a fixed window, without comparing different window sizes with each other. This is problematic as choosing the wrong training window can negatively impact prediction accuracy, especially if the underlying rate of change is varying. As a potential solution we compare different methodologies for choosing the correct window size ranging from frequent practice based on heuristics to super learning approaches. In a simulation study, we vary the rate of change in the underlying relationship form over time. We show that even computing a simple average across different windows can help reduce the prediction error rather than selecting a single best window for both simulated and real world data.

2026-01-12 — A Unified Framework for Debiased Machine Learning: Riesz Representer Fitting under Bregman Divergence

Authors: Masahiro Kato
Year: 2026
Publication Date: 2026-01-12
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Estimating the Riesz representer is central to debiased machine learning for causal and structural parameter estimation. We propose generalized Riesz regression, a unified framework for estimating the Riesz representer by fitting a representer model via Bregman divergence minimization. This framework includes various divergences as special cases, such as the squared distance and the Kullback--Leibler (KL) divergence, where the former recovers Riesz regression and the latter recovers tailored loss minimization. Under suitable pairs of divergence and model specifications (link functions), the dual problems of the Riesz representer fitting problem correspond to covariate balancing, which we call automatic covariate balancing. Moreover, under the same specifications, the sample average of outcomes weighted by the estimated Riesz representer satisfies Neyman orthogonality even without estimating the regression function, a property we call automatic Neyman orthogonalization. This property not only reduces the estimation error of Neyman orthogonal scores but also clarifies a key distinction between debiased machine learning and targeted maximum likelihood estimation (TMLE). Our framework can also be viewed as a generalization of density ratio fitting under Bregman divergences to Riesz representer estimation, and it applies beyond density ratio estimation. We provide convergence analyses for both reproducing kernel Hilbert space (RKHS) and neural network model classes. A Python package for generalized Riesz regression is released as genriesz and is available at https://github.com/MasaKat0/genriesz.

2026-01-09 — Estimating optimal interpretable individualized treatment regimes from a classification perspective using adaptive LASSO

Authors: Yunshu Zhang, Shu Yang, Wendy Ye, I. Lipkovich, Douglas Faries
Year: 2026
Publication Date: 2026-01-09
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Real-world data (RWD) gains growing interests to provide a representative sample of the population for selecting the optimal treatment options. However, existing complex black box methods for estimating individualized treatment rules (ITR) from RWD have problems in interpretability and convergence. Providing an interpretable and sparse ITR can be used to overcome the limitation of existing methods. We developed an algorithm using Adaptive LASSO to predict optimal interpretable linear ITR in the RWD. To encourage sparsity, we obtain an ITR by minimizing the risk function with various types of penalties and different methods of contrast estimation. Simulation studies were conducted to select the best configuration and to compare the novel algorithm with the existing state-of-the-art methods. The proposed algorithm was applied to RWD to predict the optimal interpretable ITR. Simulations show that adaptive LASSO had the highest rates of correctly selected variables and augmented inverse probability weighting with Super Learner performed best for estimating treatment contrast. Our method had a better performance than causal forest and R-learning in terms of the value function and variable selection. The proposed algorithm can strike a balance between the interpretability of estimated ITR (by selecting a small set of important variables) and its value.

2026-01-09 — A Targeted Learning Framework for Estimating Restricted Mean Survival Time Difference using Pseudo-observations

Authors: Man Jin, Yixin Fang
Year: 2026
Publication Date: 2026-01-09
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
A targeted learning (TL) framework is developed to estimate the difference in the restricted mean survival time (RMST) for a clinical trial with time-to-event outcomes. The approach starts by defining the target estimand as the RMST difference between investigational and control treatments. Next, an efficient estimation method is introduced: a targeted minimum loss estimator (TMLE) utilizing pseudo-observations. Moreover, a version of the copy reference (CR) approach is developed to perform a sensitivity analysis for right-censoring. The proposed TL framework is demonstrated using a real data application.

2026-01-01 — What is the ideal glucose range for a patient with sepsis in the ICU? A retrospective analysis of MIMIC-IV

Authors: Tristan Struja, Lasse Hyldig Hansen, J. Matos, Josep Gómez, Àlex Pardo, Ismini Lourentzou, N. Hejazi, L. A. Celi, A. K. Waschka
Year: 2026
Publication Date: 2026-01-01
Venue: BMJ Open
DOI: 10.1136/bmjopen-2025-104916
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Importance Clinical trials have produced inconclusive results regarding the optimal glucose range for a patient with sepsis in the intensive care unit (ICU) receiving insulin treatment. Objective To investigate the optimal glucose range in patients with sepsis in the ICU independent of confounding covariates. Design Targeted trial emulation of glucose ranges using causal inference targeted maximum likelihood estimation and longitudinal mixed-effects models combined with survival models. Setting Single-centre, academic referral hospital in Boston, Massachusetts, USA. Participants Adults fulfilling sepsis 3 criteria with at least three glucose readings and insulin treatment from the Medical Information Mart for Intensive Care (MIMIC)-IV database (2008–2019). Exposure Five predefined glucose distributions with means at 100, 130, 160 (baseline), 190 and 220 mg/dL mimicking current guidelines’ recommendations (140–180 mg/dL). Main outcome and measure The primary outcome was in-hospital mortality. Modified counterfactual treatment-policy risks across distinct time-weighted glucose ranges were estimated. Results Of 73 181 eligible patients, 8002 patients with a median age of 66 years (41% women, 67% white ethnicity, 57% diabetes) were included. There was a U-shaped curve between glucose range and mortality in patients without diabetes, but overall, this association was not significant (mean glucose at 100 mg/dL with 21% mortality and mean glucose at 220 mg/dL with 26% mortality, p-for-trend 0.26). Mortality was lowest at 17%, with mean glucose between 130 and 160 mg/dL. Hypoglycaemic events (<80 mg/dL) became increasingly more frequent with tighter glucose control 16% at 220 mg/dL compared with 77% at 100 mg/dL (p-for-trend 0.01). Joint modelling corroborated these results and did not identify covariates that would favour lower glucose ranges in subsets of patients. Conclusion and relevance Our data suggest a U-shaped association of glucose and mortality with an optimal average glucose between 160 and 190 mg/dL. These results confirm current guideline recommendations. Together with recent results from randomised controlled trials, intensivists should aim for a liberal glucose range in most patients.

2026-01-01 — Prenatal paracetamol exposure and wheezing in infancy: a targeted maximum likelihood estimation application

Authors: C. Moccia, D. Zugna, M. Popović, G. Moirano, C. Pizzi, E. Migliore, Piero Fariselli, T. Sanavia, Franca Rusconi, A. Nybo Andersen, L. Richiardi, Milena Maule
Year: 2026
Publication Date: 2026-01-01
Venue: BMJ Open Respiratory Research
DOI: 10.1136/bmjresp-2024-002930
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Introduction Targeted maximum likelihood estimation (TMLE) is a semiparametric doubly‐robust estimator that integrates the SuperLearner in the estimation process, an ensemble method that allows us to model the exposure–outcome relationship combining multiple parametric and non-parametric methods. Aim We applied TMLE to assess the effect of maternal paracetamol use during the first trimester of pregnancy on child wheezing during the first 18 months of life, using data of the Italian NINFEA birth cohort. Methods We included three progressively larger sets of covariates for confounding adjustment. Set 1 included baseline socioeconomic and maternal characteristics, conditions and disorders. Set 2 additionally included maternal respiratory infections in the first pregnancy trimester. Set 3 added prepregnancy maternal mental health disorders. The effect was estimated with three TMLE implementations, differing in the methods used to model the exposure–outcome relationship: (1) parametric; (2) SuperLearner with parametric and semiparametric approaches and (3) SuperLearner with parametric, semiparametric and non-parametric approaches, and with hyperparameters tuning. We compared TMLE with multivariable regression, propensity score regression adjustment and inverse probability weighting. Results All methods provided similar results, suggesting a weak positive association that attenuated toward the null as progressively more covariates were adjusted for, from set 1 (TMLE 3: risk ratio, RR 1.15 (95% CI 1.03 to 1.29)) to set 3 (TMLE 3: RR 1.10 (95% CI 0.97 to 1.26), N=4099). Conclusions Such an association could be interpreted as a small positive effect or incomplete control for residual or unmeasured confounding, and its consistency across methods suggests it is unlikely to be driven by model misspecification.

2026 — Enhancing project financial performance prediction: An explainable machine learning framework integrating frontier efficiency and super learner

Authors: Gihan M. Ali
Year: 2026
Venue: Journal of Project Management
DOI: 10.5267/j.jpm.2025.10.003
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
This study investigates the role of frontier operational efficiency in predicting financial performance within Egypt’s emerging market. Data Envelopment Analysis (DEA) quantifies operational efficiency, and its predictive power is assessed within a machine learning (ML) framework, extending beyond traditional financial ratios. A Super Learner ensemble is developed, integrating Random Forest (RF) and Categorical Gradient Boosting (CatBoost) with a linear regression meta-learner. The Super Learner enhances accuracy and robustness by dynamically weighting and combining predictions from diverse base models, using a meta-learner to minimize error, reduce overfitting, and improve generalization. Empirical results demonstrate that incorporating DEA significantly improves predictive performance, increasing R² by 3.8% (t = 5.45, p < 0.01). The Super Learner achieves an R² of 0.612, with an RMSE of 0.061 and MAE of 0.046, outperforming both linear regression and state-of-the-art ML models. Feature importance analysis (via CatBoost) identifies net working capital (11.5%) and DEA efficiency (10.0%) as the top predictors. SHapley Additive exPlanations (SHAP) and partial dependence analyses further indicate that DEA efficiency, net working capital, and cash holdings exhibit positive but nonlinear associations with financial performance, while leverage demonstrates a concave, nonlinear relationship. These findings provide practical implications for investors, managers, and policymakers, highlighting the strategic value of operational efficiency. Additionally, the study introduces a scalable, interpretable framework combining frontier efficiency metrics with explainable ML, offering a robust tool for financial decision-making.

2026-01-01 — Effects of long-term time-weighted fine particulate matter components and ozone exposure on incident cardiovascular disease and the mediating roles of metabolic risk factors.

Authors: F. Wen, Aibin Qu, Bingxiao Li, Pandi Li, Han Qi, Ling Zhang
Year: 2026
Publication Date: 2026-01-01
Venue: NMCD. Nutrition Metabolism and Cardiovascular Diseases
DOI: 10.1016/j.numecd.2026.104573
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
BACKGROUND AND AIM Evidence has linked long-term outdoor air pollution with cardiovascular disease (CVD), while potential causal relationships of air pollutant exposure with CVD and mediation roles of metabolic risk factors remain under-explored. METHODS AND RESULTS We evaluated time-weighted exposure to fine particulate matter (PM2.5), ozone (O3), and PM2.5 components among 21,102 participants from the CHCN-BTH cohort. Well-validated online databases, participants' outdoor activity durations, and pollutant infiltration factors were used to assess time-weighted exposure. We employed the targeted maximum likelihood estimation (TMLE) approach to estimate potential causal relationships between air pollutants and incident CVD. High-dimensional mediation analyses were used to further investigate the mediating roles of metabolic risk factors. Compared with exposures at first quartile concentration (Q1), participants in highest quartile of exposure (Q4) to air pollutants exhibited significantly increased risk of CVD incidence: PM2.5 (RR: 3.453, 95%CI: 2.674-4.460), warm-season O3 (1.332, 1.016-1.746), black carbon (BC) (4.885, 2.866-8.327), ammonium (NH4+) (1.959, 1.378-2.785), nitrate (NO3-) (1.679, 1.117-2.525), sulfate (SO42-) (2.860, 2.211-3.701), and organic matter (OM) (4.070, 2.283-7.253). High-dimensional mediation analysis indicates that high-density lipoprotein-cholesterol (HDL-C) played a mediating role in the total effects, accounting for 10.46%, 24.96%, 39.35%, and 24.74% of PM2.5, BC, SO42-, and OM, respectively. Systolic blood pressure (SBP) mediated 13.41% of the total effect attributable to warm-season O3. CONCLUSIONS This study provides potential causal linkage between air pollutants and CVD risk. Notably, our findings reveal roles of HDL-C and SBP in mediating the effects of CVD induced by air pollutant exposures.

2025 (157 papers)
2025-12-29 — Quantifying the effects of the Correa pathway from Helicobacter pylori infection to gastric cancer: causal inference found in 6.8 million Koreans

Authors: Hyun Jin Oh, C. Kim, Hyun-Wook Park, Bomi Park
Year: 2025
Publication Date: 2025-12-29
Venue: BMC Cancer
DOI: 10.1186/s12885-025-15507-9
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
The Correa pathway describes the sequential progression of gastric cancer through Helicobacter pylori (H. pylori) infection, chronic gastritis, atrophic gastritis, intestinal metaplasia, adenoma, and gastric cancer. Clarifying the causal relationships along this pathway can provide robust evidence for targeted gastric cancer preventive strategies. This study was conducted using data from the Korean National Health Insurance Service, including 6,863,103 individuals who participated in the National Cancer Screening Program for gastric cancer in 2018, with a 2-year follow-up. We used doubly robust targeted maximum likelihood estimation (TMLE) to quantify the total effect of H. pylori eradication on incident gastric cancer and applied causal mediation analysis (CMA) to evaluate indirect pathways through atrophic gastritis/intestinal metaplasia and adenoma. Analyses were adjusted for age, sex, income, smoking, alcohol use, and family history of gastric cancer. TMLE revealed that H. pylori infection significantly increased gastric cancer risk (relative risk [RR] = 6.40; 95% confidence interval [CI]: 6.05–6.77), as well as the risk of atrophic gastritis/intestinal metaplasia (RR = 1.41; 95% CI: 1.35–1.43) and adenoma (RR = 5.81; 95% CI: 5.68–5.94). atrophic gastritis/intestinal metaplasia substantially elevated the risk of adenoma (RR = 1.72; 95% CI: 1.67–1.77) and gastric cancer (RR = 1.33; 95% CI: 1.28–1.44). CMA showed that 3% of the H. pylori effect on gastric cancer was mediated through atrophic gastritis/intestinal metaplasia (odds ratio [OR] = 1.03, 95% CI: 1.02–1.04), and 36% was mediated via adenoma (OR = 1.97, 95% CI: 1.94–2.01). Among individuals with atrophic gastritis/intestinal metaplasia, adenoma accounted for 44% of the subsequent gastric cancer risk (OR = 1.13, 95% CI: 1.03–1.34). Our findings demonstrate that H. pylori infection is the most important causal determinant of gastric cancer and adenoma, indicating that prevention of H. pylori is central to gastric cancer prevention.

2025-12-26 — A BSMOTE-OOA-SuperLearner Hybrid Framework for Interpretable Prediction of Pillar Stability

Authors: Weizhang Liang, Yu Liu, Pengpeng Lu, Zheng Li
Year: 2025
Publication Date: 2025-12-26
Venue: Symmetry
DOI: 10.3390/sym18010049
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Pillar stability prediction is essential for underground mining safety, yet it remains challenging due to limited data, class imbalance, and insufficient interpretability. This study proposes an integrated Borderline-SMOTE-Osprey Optimization Algorithm-Super Learner framework (BSMOTE-OOA-SL) for hard-rock pillar stability prediction. The framework combines five heterogeneous base learners (ANN, GBDT, KNN, RF, and SVM), applies Borderline-SMOTE within training folds to alleviate class imbalance, and employs the Osprey Optimization Algorithm (OOA) for systematic hyperparameter optimization. The model is evaluated using a dataset of 241 pillar cases from seven underground mines. Statistical experiments based on multiple random train–test splits show that the proposed framework consistently outperforms individual base learners in terms of Accuracy, Macro-Precision, Macro-Recall, and Macro-F1, demonstrating improved robustness and generalization. Ablation results indicate that the joint use of Borderline-SMOTE and OOA leads to quantitative performance gains of 10.21%, 12.25%, 12.61%, and 12.86% in Accuracy, Macro-Precision, Macro-Recall, and Macro-F1, respectively. Under a representative data split, the model achieves an overall accuracy of 95.92%, with strong class-wise Precision, Recall, and F1-score across all stability categories, and AUC values exceeding 0.9 for all classes (reaching 1.0 for the Failed category). SHAP-based interpretability analysis identifies stress-related indicators—particularly average pillar stress, Stress/UCS ratio, and UCS—as the dominant factors governing pillar stability. Overall, the proposed BSMOTE-OOA-SL framework provides a robust, interpretable, and statistically reliable solution for hard-rock pillar stability prediction.

2025-12-15 — Twelve-Month Results From the CISTO Study Comparing Radical Cystectomy Versus Bladder-Sparing Therapy for Recurrent High-Grade Non–Muscle-Invasive Bladder Cancer

Authors: John L. Gore, Erika M Wolff, Michael G Nash, Bryan A. Comstock, Scott M. Gilbert, Sam S. Chang, Stephanie Chisolm, Doug B MacLean, Jonathan L Wright, Max Kates, Kamal S. Pohar, Thomas J. Guzzo, T. Bivalacqua, K. Nepple, Jeffrey S. Montgomery, K. Scarpato, S. Woldu, Viraj A Master, David Y T Chen, M. Mossanen, S. Daneshmand, Brock B O'Neil, Mark D Tyson, Mary E. Westerman, Ashish M. Kamat, Ahmed M. Mansour, K. Chamie, Stephen B. Riggs, J. Kukreja, Parth K. Modi, Tullika Garg, C.C. Peyton, Jeffrey W. Nix, Rian J. Dickstein, A. Gadzinski, Alex Sankin, Neal D. Shore, Brian R Lane, J. C. Bassett, Sanjay Patel, David S. Morris, L. Macleod, Eugene K Lee, Chad R Ritch, Kristin M Follmer, Jenney R. Lee, Sung Min Kim, Larry G Kessler, Angela B. Smith
Year: 2025
Publication Date: 2025-12-15
Venue: Journal of Clinical Oncology
DOI: 10.1200/JCO-25-01324
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
PURPOSE To compare patient-reported and clinical outcomes between radical cystectomy (RC) and bladder-sparing therapy (BST) in patients with recurrent high-grade non–muscle-invasive bladder cancer (NMIBC). PATIENTS AND METHODS This pragmatic, prospective observational cohort study was designed with patients, who selected and prioritized outcomes. Eligible adults were candidates for both RC or BST, had previous induction Bacillus Calmette-Guérin (BCG), and received their last treatment within 12 months. The primary outcome was the EORTC-QLQ-C30 physical function scale at 12 months. Secondary outcomes included other EORTC-QLQ-C30 scales, depression, anxiety, bladder cancer–specific quality of life (QOL), financial burden, and cancer-specific outcomes. Targeted maximum likelihood estimation (TMLE) was used to calculate average treatment effect (ATE) estimates between arms. Inverse probability weighted risk ratios (wRR) were calculated using quasi-Poisson regression. RESULTS Of 570 participants (mean age 71.4 years; 21% female), 371 selected BST and 199 selected RC. Physical function was significantly worse in the RC arm at 3 months; by 9 months, there was no difference between arms, and at 12 months, physical function did not differ (ATE, 0.9; 95% CI, –0.6 to 2.4; P = .22). RC was associated with better emotional function, generic health-related QOL, and financial burden, and lower depression and anxiety, while BST was associated with better bowel and sexual health. Cancer-specific survival was 99% for BST versus 96% for RC (wRR, 0.99; 95% CI, 0.97 to 1.01). RC was associated with a higher risk of adverse events and serious adverse events, including a 90-day mortality rate of 2.5%. CONCLUSION Most patient-prioritized outcomes were similar or better among participants who chose RC compared with BST. These findings support the continued role of RC in managing recurrent high-grade NMIBC.

2025-12-15 — Machine learning to optimize precision in the analysis of randomized trials: A journey in pre-specified, yet data-adaptive learning

Authors: L. Balzer, Mark J. van der Laan, Maya L Petersen
Year: 2025
Publication Date: 2025-12-15
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Covariate adjustment is an approach to improve the precision of trial analyses by adjusting for baseline variables that are prognostic of the primary endpoint. Motivated by the SEARCH Universal HIV Test-and-Treat Trial (2013-2017), we tell our story of developing, evaluating, and implementing a machine learning-based approach for covariate adjustment. We provide the rationale for as well as the practical concerns with such an approach for estimating marginal effects. Using schematics, we illustrate our procedure: targeted machine learning estimation (TMLE) with Adaptive Pre-specification. Briefly, sample-splitting is used to data-adaptively select the combination of estimators of the outcome regression (i.e., the conditional expectation of the outcome given the trial arm and covariates) and known propensity score (i.e., the conditional probability of being randomized to the intervention given the covariates) that minimizes the cross-validated variance estimate and, thereby, maximizes empirical efficiency. We discuss our approach for evaluating finite sample performance with parametric and plasmode simulations, pre-specifying the Statistical Analysis Plan, and unblinding in real-time on video conference with our colleagues from around the world. We present the results from applying our approach in the primary, pre-specified analysis of 8 recently published trials (2022-2024). We conclude with practical recommendations and an invitation to implement our approach in the primary analysis of your next trial.

2025-12-12 — Ensemble-Based Gold Price Prediction Using Super Learner

Authors: K. Sudarshan, S. Sharan Karthik, K. Sudha
Year: 2025
Publication Date: 2025-12-12
Venue: 2025 10th International Conference on Smart Structures and Systems (ICSSS)
DOI: 10.1109/ICSSS66939.2025.11346376
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Proper prediction of gold prices is essential as a risk management tool, portfolio optimization, and strategic financial planning- Gold is one of the most prominent assets in global financial markets because of its status as a safe-haven, inflation-hedge, and long-term value holding asset. To solve the intrinsic instability of large fluxes and a complex dynamic of the gold prices, the given paper proposes a complex pattern of a gold price prediction with the help of a Super Learner ensemble model which is the system based on several algorithms and able to take advantage of the strengths of each separately and reduce the drawbacks of each. The model integrates five different learners- Linear Regression (LR), Random Forest (RF), Gradient Boosting Machine (GBM), Support Vector Regression (SVR), and K-Nearest Neighbors (KNN) each having its own strength in terms of predictivity of the model owing to the types of assumptions that they make as well as their predictive algorithms. indexes, stock market indexes, and exchange rates. Each entity base learner would be trained on this augmented database, where the non-linear patterns and linear relations were capture, and the metalearner of the ensemble would cumulative these estimations in a stacking design with cross-validation to decrease overfitting and raise the degree to which it would represent the general population. The performance of the Super Learner uses extensive evaluation metrics, such as Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Rsquared ($\mathbf{R}^{\mathbf{2}}$), thereby validating the greater effectiveness of ensemble learning to address complex forecasting requirements within a financial setting.

2025-12-01 — Volunteering and Cardiovascular Biomarkers: A Machine-Learning Causal Approach in Health and Retirement Study

Authors: Seoyoun Kim, K. Shiba, Cal J. Halvorsen
Year: 2025
Publication Date: 2025-12-01
Venue: Innovation in aging
DOI: 10.1093/geroni/igaf122.203
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract Prosocial engagement, such as volunteering, has been recognized for its protective effects against cardiovascular diseases (CVD), yet its causal impact on CVD-related biomarkers remains unclear. This study employs targeted maximum likelihood estimator (TMLE), a doubly robust causal inference method, to estimate the effect of the changes in volunteer activity (2006/2008 and 2010/2012) and seven key CVD-related biomarkers (in 2014/2016) using the Health and Retirement Study (N = 17,479). The biomarkers assessed include blood glucose (HbA1c), lipid regulation (TC/HDL ratio), chronic inflammation (CRP), kidney function (Cystatin C), blood pressure (systolic and diastolic), and BMI. Volunteering status was categorized into non-volunteers, initiators, stoppers, and sustained volunteers. TMLE results indicate that, had the entire population engaged in volunteering, we would expect improvements in CVD biomarkers compared to no volunteering at either wave. Sustained volunteering was associated with a 4.6% lower prevalence of unhealthy HbA1c levels (-0.046, p = 0.016), a 1.7% lower prevalence of an unhealthy TC/HDL ratio (-0.02, p = 0.001), an 8.1% lower prevalence of elevated CRP levels (-0.04, p = 0.0003), a 3.7% lower prevalence of unhealthy Cystatin C levels (-0.08, p < 0.001), an 8.9% and 8.7% lower prevalences of elevated diastolic and systolic blood pressure, respectively (DBP: -0.09, p<.001, SBP: -0.09, p<.001). However, a modest 1.2% increase in BMI was also observed. These findings highlight the potential of long-term volunteering as a population-level intervention to improve cardiovascular health, underscoring the need for further research to explore underlying mechanisms and long-term clinical benefits.

2025-12-01 — Robust prediction of multi-layer heat storage temperatures in an integrated solar pond–collector system using a Super Learner ensemble model

Authors: Faruk Kurker, I. Bozkurt, H. Sogukpinar, M. Karakilçik
Year: 2025
Publication Date: 2025-12-01
Venue: Journal of Energy Storage
DOI: 10.1016/j.est.2025.118720
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2025-12-01 — Cluster‐Level Analyses to Estimate a Risk Difference in a Cluster Randomized Trial With Confounding Individual‐Level Covariates: A Simulation Study

Authors: J. P. Pereira Macedo, B. Giraudeau
Year: 2025
Publication Date: 2025-12-01
Venue: Statistics in Medicine
DOI: 10.1002/sim.70341
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Cluster randomized trials (CRTs) may be analyzed using cluster‐level analyses. For binary outcomes, proportions are estimated for each cluster, and a risk difference can be estimated. The confidence interval is estimated using a Student distribution. However, in doing so, individual‐level characteristics are not adjusted for even though CRTs are known to be prone to recruitment/identification bias, possibly implying individual‐level confounders. With a simulation study, we compared cluster‐level analyses to estimate a risk difference for a two‐arm parallel CRT with individual‐level confounders and cluster‐level covariates. We considered the unadjusted (UN) method, two two‐stage procedure (TSP) methods considering a binomial or a Gaussian distribution, G‐computation (GC), and targeted maximum likelihood estimation (TMLE) methods. As expected, the UN method was biased. TSP methods were also biased for scenarios with a treatment effect when the number of clusters per arm was small. GC and TMLE methods were unbiased. For these latter methods, adjustment on only individual‐level covariates led to better performance measures (type I error rate, coverage rate and relative error of the standard error) than adjustment on both individual‐ and cluster‐level covariates. TSP, GC and TMLE had very similar results except in scenarios with a small number of clusters: Biased results for TSP methods and convergence problems for GC methods. In this case, TMLE should be preferred.

2025-12-01 — A comparison of allied healthcare versus no allied healthcare on participation, fatigue, physical functioning and health-related quality of life for patients with persistent complaints after a COVID-19 infection

Authors: Ângela Jornada Ben, Anita Natalia Varga, S. de Bruijn, Willem Bastiaan Dekker, C. C. van den Wijngaard, A. C. Verburg, T.J. Hoogeboom, P. J. van der Wees, R. Ostelo, Judith E. Bosmans, JM van Dongen
Year: 2025
Publication Date: 2025-12-01
Venue: Annals medicus
DOI: 10.1080/07853890.2025.2600139
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract Objective To assess the effectiveness of allied healthcare versus no allied healthcare. Materials and Methods Data from the ParaCOV cohort (allied healthcare, n = 1,451) and the LongCOVID cohort (no allied healthcare/control, n = 1427) were analyzed. Average treatment effects (ATEs) between groups were estimated using Targeted Maximum Likelihood Estimation adjusted for age, sex, body mass index, smoking status, comorbidities, and effect outcomes’ baseline values. A ≥ 10% between-group difference in improvement from baseline (BTGD) was considered clinically relevant for participation, fatigue, and physical functioning, and ≥0.062 for health-related quality of life. Results Patients receiving allied healthcare were older (49.2 vs. 41.2 years), less often female (63.3% vs. 70.1%), had higher BMI (28.2 vs. 26.1), smoked less frequently (5.0% vs. 9.0%), had more comorbidities (49.2% vs. 41.9%), and lower baseline anxiety and depression scores compared to those not receiving allied healthcare. For participation, ATEs after 6 and 12 months were respectively −2.62 (95%CI: −4.39; −0.86) and −1.68 (95%CI: −4.81;1.45), with BTGDs of 4.7% and 1.8% favoring the control. For fatigue, ATEs were 1.72 (95%CI: −0.14; 3.58) and 0.97 (95%CI: −1.48; 3.41), with BTGDs of 6.5% and 3.7% favoring the control. For physical functioning, ATEs were 5.75 (95% CI: 4.42; 7.09) and 6.36 (95%CI: 4.84; 7.88), with BTGDs of 1.4% and 2.2% favoring allied healthcare. For health-related quality of life, ATEs were 0.017 (95%CI: −0.008; 0.0044) and 0.033 (95%CI: 0.011; 0.054). Conclusions Patients with persistent complaints after a COVID-19 infection showed significantly lower participation after 6 months, higher health-related quality of life after 12 months, and better physical functioning after 6 and 12 months of allied healthcare, however, BTGDs were not clinically relevant. Study limitations warrant cautious results interpretation. KEY MESSAGES Although health-related quality of life and physical functioning improved in Long COVID patients, this cannot be definitively attributed to allied healthcare. The observed outcome differences between Long COVID patients with and without allied healthcare were not clinically relevant. More research is needed for tailored rehabilitation treatments for these patients.

2025-11-28 — Longitudinal correlation between cumulative remnant cholesterol inflammatory index and incident diabetes

Authors: Qiqiang Li, Li Gong, Hongxin Li
Year: 2025
Publication Date: 2025-11-28
Venue: Diabetology & Metabolic Syndrome
DOI: 10.1186/s13098-025-02018-7
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Diabetes is closely associated with dyslipidemia and inflammation. However, studies on the combined effects of inflammation and dyslipidemia on incident diabetes are lacking. Remnant cholesterol inflammation index (RCII) is a clinical indicator of inflammation and dyslipidemia. In this study, we aimed to investigate the longitudinal relationship between cumulative RCII (cumRCII) and incident diabetes. Data were sourced from the China Health and Retirement Longitudinal Study from 2011 to 2018. The mean age of the study participants was 58 years. CumRCII was treated as the exposure variable and incident diabetes as the outcome variable. Restricted cubic splines and logistic regression were used to evaluate the relationship between cumRCII and incident diabetes, and longitudinal targeted maximum likelihood estimation used for sensitivity analysis. A total of 4,513 participants were included in the analysis. During a 7-year follow-up period, 302 cases of incident diabetes were diagnosed. Using the first quartile of cumRCII as the reference, the odds ratio (OR) for incident diabetes in the fourth quartile was 2.60 (OR = 2.60; 95% confidence interval [CI], 1.74–3.87; P < 0.001). Each standard deviation increase in cumRCII, the OR for incident diabetes was 1.26 (OR = 1.26; 95% CI, 1.16–1.37; P < 0.001). Sensitivity analysis yielded consistent conclusions, with the Net Reclassification Improvement (NRI) for the model incorporating cumRCII being 0.346 and the DeLong test showing statistical significance. We found a nonlinear relationship between cumRCII and incident diabetes, the threshold value of cumRCII with respect to incident diabetes was 43.39. CumRCII is closely associated with incident diabetes in adults aged ≥ 45 years, and a significant correlation was found between cumRCII and incident diabetes in the population with normal triglyceride levels. Furthermore, the longitudinal relationship between cumulative RCII and incident diabetes was non-linear. CumRCII can be used as an early indicator of incident diabetes in a population with normal triglycerides. This study confirmed that cumulative remnant cholesterol inflammation index (cumRCII) is closely associated with incident diabetes in adults aged 45 years and older. Furthermore, a close correlation exists between cumRCII and incident diabetes in the population with normal triglyceride levels.

2025-11-28 — A Comparative Analysis in a Clinical Cohort: Multiple Imputation by Chained Equations and a Novel Super Learner-Based Imputation Approach

Authors: Tony Zbysinski, PhD Lezhou Wu, PhD Justin Dale, James Coates, MS Karan Sapiah, MS Jamie Reuben, F. Markson, U. Kulkarni, Nazmul Islam
Year: 2025
Publication Date: 2025-11-28
Venue: medRxiv
DOI: 10.1101/2025.11.26.25341082
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2025-11-27 — Impact of Tooth Preservation or Replacement on Diet Quality: Modified Treatment Policies Approach.

Authors: J.R.H. Tay, A. C. Kalhan, U. Cooray, Gustavo G Nascimento
Year: 2025
Publication Date: 2025-11-27
Venue: Journal of Periodontal Research
DOI: 10.1111/jre.70053
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
AIM To examine the impact of emulated natural tooth preservation or dental rehabilitation scenarios on diet quality among older adults in the United States, assessing what diet quality would be if people's dentition or rehabilitation status were improved. METHODS Data from 2606 participants aged ≥ 60 across four NHANES cycles (2011-2018) were analyzed. Dentition status and prosthetic rehabilitation were clinically assessed, and diet quality was measured using the Healthy Eating Index (HEI)-2020. The modified treatment policies (MTP) framework was used to emulate hypothetical interventions of retaining natural teeth or providing prosthetic rehabilitation. Odds ratios (OR) for achieving better diet quality under each scenario were estimated using doubly robust targeted maximum likelihood estimation (TMLE). RESULTS If all participants retained ≥ 25 teeth, the likelihood of achieving better HEI scores increased by 51% (OR = 1.51, 95% CI: 1.32-1.69). In a pragmatic scenario where each dentition group was shifted up by one level (edentulous → 1-9 teeth, 1-9 teeth → 10-19 teeth, 10-19 teeth → 20-25 teeth, 1-9 teeth → ≥ 25 teeth), the population had a 20% higher likelihood of achieving better diet quality (OR = 1.22, 95% CI: 1.14-1.29). Prosthetic rehabilitation yielded comparatively smaller effects, with rehabilitation in individuals with < 25 teeth improving the likelihood of better diet quality by 8% (OR = 1.08, 95% CI: 1.03-1.12). CONCLUSION The present observational study suggests that preserving natural teeth has a stronger population-level impact on diet quality than prosthetic rehabilitation replacing missing teeth in older adults.

2025-11-27 — Are Neural Representation Learning Methods a Viable Alternative to TMLE for Causal Estimation?

Authors: Mohammad Ehsanul Karim, Zining Annie Wang
Year: 2025
Publication Date: 2025-11-27
Venue: Data Science in Science
DOI: 10.1080/26941899.2025.2583507
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract unavailable from Semantic Scholar.

2025-11-20 — The Effect of Treating Hearing Loss with Hearing Aids on Plasma Biomarkers of Alzheimer’s Disease and Related Dementias

Authors: Lachlan Cribb, Margarita Moreno-Betancur, J. Sarant, R. Wolfe, Matthew P Pase, Gary Rance, Michelle M Mielke, A. Murray, Alice J. Owen, Robyn L. Woods, Zhen Zhou, Zimu Wu, Kerry M. Sheets, T. Chong, Raj C. Shah, Joanne Ryan
Year: 2025
Publication Date: 2025-11-20
Venue: medRxiv
DOI: 10.1101/2025.11.19.25340558
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Background: Promising evidence indicates that treating hearing loss with hearing aids (HAs) could reduce dementia risk. We extend this evidence by investigating the effect of HAs on plasma biomarkers of Alzheimer's disease and related dementias (ADRD). Methods: We emulated two target trials using observational data from Australian participants of the ASPREE study. Eligible participants had self-reported hearing problems, no past HA use, and were dementia-free. HA prescriptions and frequency of HA use were measured by questionnaire. Phosphorylated-tau181 (pTau181), neurofilament light chain (NfL), glial fibrillary acidic protein (GFAP), and amyloid-{beta} (A{beta}) 42/40 were measured after approximately 6-8 years. We estimated the effect of new HA prescription (first target trial) and the frequency of HA use (second target trial) using targeted maximum likelihood estimation, with multiple imputation for missing data. Results: Across imputed datasets, a median of 2842 eligible individuals were included (mean age 75 years, 48% female), with a median of 735 receiving a new HA prescription. Among survivors, the estimated mean differences comparing HA prescription and no HA prescription were 1.8 pg/mL (95% CI: -0.6, 4.1), 0.1 pg/mL (-7.8, 8.0), -2.2 pg/mL (-14.5, 10.1), and -0.7 (-2.6, 1.2) for the concentrations of pTau181, NfL, GFAP, and (A{beta}42 x 1000)/A{beta}40, respectively. Mean differences did not differ substantially across levels of potential baseline effect modifiers, including APOE-{varepsilon}4 genotype and cognition. Conclusion: In community-dwelling older people with hearing loss and no dementia, we found minimal effects of HA prescription and frequency of HA use on plasma ADRD biomarkers after a 7-year follow-up.

2025-11-20 — Targeted Maximum Likelihood Estimation for Causal Inference With Observational Data-The Example of Private Tutoring.

Authors: Christoph Jindra, Karoline A. Sachse
Year: 2025
Publication Date: 2025-11-20
Venue: Multivariate Behavioral Research
DOI: 10.1080/00273171.2025.2561942
Link: Semantic Scholar
Matched Keywords: super learning, targeted maximum likelihood estimation, tmle

Abstract:
State-of-the-art causal inference methods for observational data promise to relax assumptions threatening valid causal inference. Targeted maximum likelihood estimation (TMLE), for example, is a template for constructing doubly robust, semiparametric, efficient substitution estimators, providing consistent estimates if the outcome or treatment model is correctly specified. Compared to standard approaches, it reduces the risk of misspecification bias by allowing (nonparametric) machine-learning techniques, including super learning, to estimate the relevant components of the data distribution. We briefly introduce TMLE and demonstrate its use by estimating the effects of private tutoring in mathematics during Year 7 on mathematics proficiency and grades using observational data from starting cohort 3 of the National Education Panel Study (N= 4,167). We contrast TMLE estimates to those from ordinary least squares, the parametric G-formula, and the augmented inverse-probability weighted estimator. Our findings reveal close agreement between methods for end-of-year grades. However, variations emerge when examining mathematics proficiency as the outcome, highlighting that substantive conclusions may depend on the analytical approach. The results underscore the significance of employing advanced causal inference methods, such as TMLE, when navigating the complexities of observational data and highlight the nuanced impact of methodological choices on the interpretation of study outcomes.

2025-11-19 — Hazard-Based Targeted Maximum Likelihood Estimation for Survival in Resampling Designs

Authors: Kirsten E. Landsiedel, Rachael V. Phillips, Maya L. Petersen, M. Biostatistics, U. California, Berkeley
Year: 2025
Publication Date: 2025-11-19
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Survival is a key metric for evaluating standards of care for people living with HIV. In resource-limited settings, high rates of loss to follow-up (LTFU) often result in underestimation of mortality when only observed deaths are considered. Resampling, which tracks a subset of LTFU patients to ascertain their outcomes, mitigates bias and improves survival estimates. However, common estimators for survival in resampling designs, such as weighted Kaplan-Meier (KM), fail to leverage covariate information collected during repeated clinic visits, even though this information is highly predictive of survival. We propose a Targeted Maximum Likelihood Estimator (TMLE) for survival in resampling designs, which addresses these limitations by leveraging baseline and longitudinal covariates to achieve greater efficiency. Our TMLE is a plug-in estimator and is robust to misspecification of the initial model for the conditional hazard of death, guaranteeing consistency of our estimator due to known resampling probabilities. We present: (1) a fully efficient TMLE for data from resampling studies with fixed follow-up time for all participants and (2) an inverse probability of censoring weighted (IPCW) TMLE that accounts for varied follow-up times by stratifying on patients with sufficient follow-up to evaluate survival. This IPCW-TMLE can be made highly efficient through nonparametric or targeted estimation of the follow-up censoring mechanism. In simulations, our TMLE reduced variance by up to 55% compared with the commonly used weighted KM estimator while preserving nominal confidence interval coverage. These findings demonstrate the potential of our TMLE to improve survival estimation in resampling designs, offering a robust and resource-efficient framework for HIV research. Keywords: Resampling designs, Survival analysis, Targeted Maximum Likelihood Estimation, Inverse probability weighting

2025-11-17 — Super Learning Model and Cognitive Achievement Among First Grade Intermediate Students

Authors: Adnan Mayouf Ali
Year: 2025
Publication Date: 2025-11-17
Venue: Academia Open
DOI: 10.21070/acopen.10.2025.12938
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
General Background: History education is increasingly expected to move beyond memorization by promoting analytical thinking and meaningful engagement with historical contexts. Specific Background: The FATA Super Learning Model, which integrates focusing, activity-based exploration, guided training, and applied learning, has been suggested as an approach that can strengthen students’ cognitive and analytical skills, yet empirical evidence in early secondary history education is still limited. Knowledge Gap: Research examining teachers’ perceptions of the model’s effectiveness and its direct influence on cognitive achievement among middle-school students remains scarce. Aims: This study analyzes the effectiveness of the FATA model in enhancing cognitive achievement, historical reasoning, and classroom interaction among first-grade intermediate students in Iraq. Results: Responses from 50 history teachers show strong agreement that the model deepens understanding, increases motivation, improves critical analysis, strengthens the ability to link past and present, and enhances active participation, although challenges such as limited training, resource constraints, and insufficient institutional support were noted. Novelty: The study offers one of the earliest systematic evaluations of the FATA model within Iraqi middle-school settings, combining quantitative and qualitative data from practitioners. Implications: The findings underline the importance of structured training, curriculum alignment, and stronger institutional support to ensure effective and sustainable implementation of the model in history classrooms.Highlight : Highlights the model’s role in strengthening students’ deep understanding of historical events. Emphasizes improved motivation and active engagement during history learning. Underscores the model’s contribution to linking past events with present-day contexts. Keywords : Super Learning, FATA Model, Teaching History, Cognitive Achievement, Critical Thinking

2025-11-15 — Comparative Mortality Risk of Aripiprazole, Olanzapine, Quetiapine and Risperidone in Alzheimer’s Disease: A Real World Cohort Study with Treatment Effect Heterogeneity Analysis

Authors: Chen Jiang, J. Krivinko, Zeshui Yu, Robert A Sweet, Lang Zeng, Hui Wang, Ying Ding, Zheng Zeng, Julia Kofler, Lirong Wang
Year: 2025
Publication Date: 2025-11-15
Venue: medRxiv
DOI: 10.1101/2025.11.13.25340096
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Abstract Background: Second-generation antipsychotics (SGAs) are frequently used off-label to manage behavioral symptoms in Alzheimer's disease (AD), despite ongoing concerns about their safety. Comparative evidence on mortality risk across specific SGAs remains limited. Objective: To compare all-cause mortality among AD patients treated with commonly prescribed SGAs and to explore treatment effect heterogeneity using causal machine learning. Methods: We conducted a retrospective cohort study using de-identified electronic health records from the Truveta platform (2018-2024). Patients with incident AD initiating treatment with aripiprazole, risperidone, quetiapine, or olanzapine were identified using an active comparator, new-user design. Drug exposure was modeled as a time-varying covariate in Cox proportional hazards models, with propensity score matching applied to control for confounding. Causal tree and targeted maximum likelihood estimation (TMLE) were used to identify subgroups with heterogeneous treatment effects. Results: Among 17,004 AD patients, aripiprazole was associated with significantly lower mortality than olanzapine (HR = 0.667, 95% CI: 0.472-0.941) and quetiapine (HR = 0.677, 95% CI: 0.462-0.990). Quetiapine was also associated with lower mortality than olanzapine (HR = 0.833, 95% CI: 0.702-0.990) and risperidone (HR = 0.830, 95% CI: 0.705-0.978). Causal tree analysis revealed treatment effect heterogeneity by clinical characteristics, particularly among patients using type 2 diabetes (T2DM) medications. In subgroup analyses, aripiprazole remained protective in T2DM users (HR = 0.604 vs. quetiapine and risperidone, p = 0.002). Conclusions: Mortality risks vary substantially across SGAs in AD patients. Aripiprazole and quetiapine were associated with lower mortality compared to olanzapine and risperidone. Treatment effect heterogeneity suggests the need for individualized prescribing based on patient characteristics such as comorbid T2DM.

2025-11-11 — Estimating causal effects with machine learning: A guide for ecologists

Authors: Suchinta Arif
Year: 2025
Publication Date: 2025-11-11
Venue: Methods in Ecology and Evolution
DOI: 10.1111/2041-210X.70191
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
In ecology, there is a growing need to move beyond correlations to uncovering causal effects from observational data. With the parallel increase in big data and machine learning algorithms, the opportunity now exists to benefit from causal machine learning methodologies. This paper presents an accessible overview of four causal machine learning methods, double machine learning (DML), targeted maximum likelihood estimation (TMLE), deep instrumental variables (Deep IV) and causal forests, that can be applied across ecological contexts. DML and TMLE leverage machine learning to estimate causal effects in the presence of known confounders. Deep IV offers a robust solution for addressing unmeasured confounding or bidirectional relationships by pairing valid instruments with deep neural networks. Causal forests uncover heterogeneity in causal effects, shedding light on context‐dependent ecological responses. Adding these causal machine learning techniques to an ecologist's broader causal toolkit will increase the options researchers have for estimating causal relationships, particularly when dealing with complex and large‐scale observational data.

2025-11-10 — Causal Inference for Network Autoregression Model: A Targeted Minimum Loss Estimation Approach

Authors: Yong Wu, Shuyuan Wu, Xinwei Sun, Xuening Zhu
Year: 2025
Publication Date: 2025-11-10
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
We study estimation of the average treatment effect (ATE) from a single network in observational settings with interference. The weak cross-unit dependence is modeled via an endogenous peer-effect (network autoregressive) term that induces distance-decaying network dependence, relaxing the common finite-order interference to infinite interference. We propose a targeted minimum loss estimation (TMLE) procedure that removes plug-in bias from an initial estimator. The targeting step yields an adjustment direction that incorporates the network autoregressive structure and assigns heterogeneous, network-dependent weights to units. We find that the asymptotic leading term related to the covariates $\mathbf{X}_i$ can be formulated into a $V$-statistic whose order diverges with the network degrees. A novel limit theory is developed to establish the asymptotic normality under such complex network dependent scenarios. We show that our method can achieve smaller asymptotic variance than existing methods when $\mathbf{X}_i$ is i.i.d. generated and estimated with empirical distribution, and provide theoretical guarantees for estimating the variance. Extensive numerical studies and a live-streaming data analysis are presented to illustrate the advantages of the proposed method.

2025-11-04 — The role of cardiovascular disease as a mediator in mitigating the impact of ambient PM2.5 on dementia risk

Authors: Chengying Lin, Yechi Zhang, A. Dewan, Laura Forastiere, Kai Chen
Year: 2025
Publication Date: 2025-11-04
Venue: Environmental Epidemiology
DOI: 10.1097/EE9.0000000000000436
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Background: Previous research suggested that stricter ambient particulate matter with an aerodynamic diameter less than or equal to 2.5 μm (PM2.5) standards would reduce the risk of dementia, but immediate reductions in PM2.5 levels are not always feasible. Limited studies have quantified the extent to which cardiovascular disease (CVD) interventions could mitigate the effect of not achieving stricter PM2.5 standards on dementia risk. Methods: This study included 283,813 participants in the UK Biobank cohort aged ≥60 years without dementia diagnosis or CVD hospitalization at baseline (2015). We applied the longitudinal targeted maximum likelihood estimation to estimate the total effect of no PM2.5 intervention compared with hypothetical PM2.5 interventions on the 5-year risk of dementia. We also estimated the controlled direct effect by setting all participants free from CVD hospitalization in both scenarios. Results: Compared with the hypothetical intervention of reducing PM2.5 exposure by 10% if it is above the standard of 10 µg/m3, the total effect of no PM2.5 intervention increased the 5-year dementia risk by 3.77 (95% confidence interval [CI] = 1.47, 4.39) cases per 1000 participants. By setting all participants free from CVD hospitalization, the controlled direct effect was an increase of 2.82 (95% CI = 0.74, 3.30) cases per 1000 participants. The proportion of the total effect that could be mitigated by preventing CVD hospitalization was 25.27% (95% CI = 17.49%, 49.42%) among all eligible participants. Conclusion: Part of the effect of not achieving stricter ambient PM2.5 standards on dementia risk could be mitigated by preventing CVD hospitalization in the UK Biobank cohort. This finding indicates the potential of promoting cardiovascular health to reduce dementia burden.

2025-11-04 — Abstract 4335923: Added Value of Clinical Data Over Claims Data in Controlling for Confounding by Indication: A Case Example Assessing the Effectiveness of Community-Based Rehabilitation Therapy After Stroke

Authors: Shuqi Zhang, Janet K Freburger, Justin Trogdon, Charity Patterson, Molly Wen, Sara Jones
Year: 2025
Publication Date: 2025-11-04
Venue: Circulation
DOI: 10.1161/circ.152.suppl_3.4335923
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Background: Observational studies that examine the comparative effectiveness of healthcare services often face challenges in controlling for confounding by indication. This study examines whether clinical data adds value over claims data alone in addressing this bias when evaluating the effectiveness of community-based physical or occupational therapy (PT/OT) after stroke. Methods: Medicare claims data from the 6 months prior to and including the index hospitalization were linked to clinical data of 5,244 stroke survivors discharged home from 40 North Carolina hospitals. Measures of stroke severity, comorbidities, and previous healthcare utilization were derived from the claims. Clinical measures included the National Institutes of Health Stroke Scale, stroke diagnosis categories, ambulatory status, comorbidities, and therapy need. We estimated the effectiveness of any PT/OT use versus no use within 30 days of discharge. The primary outcome was 90-day functional status after discharge. We used Targeted Maximum Likelihood Estimation (TMLE) with SuperLearner and Inverse Probability of Treatment Weighting (IPTW) respectively to control for confounding across claims-only, clinical-only, and two joint models, claims-based with unique clinical elements and clinical-based with unique claims elements. Results: Across all models in the full population (mean age, 74; 53% female; 78% Whites), receipt of any therapy within 30 days was unexpectedly associated with lower 90-day functional score (Figure 1). Models incorporating clinical data yielded more attenuated and consistent estimates closer to the hypothesized beneficial effect of therapy, while the addition of unique claims data elements did not change the estimates of clinical-only models (Figure 1). When the analysis was restricted to the 2,335 patients who needed therapy at discharge, there was no significant association between PT/OT use and functional score (Figure 2). Among the estimation approaches, TMLE models yielded more theory-consistent and precise estimates than IPTW models. Conclusions: Clinical data outperformed claims data in controlling for confounding by indication. Restricting to individuals who needed therapy reduced confounding by indication. The unexpected, non-significant effect may be explained by residual confounding and/or the imprecision of PT/OT measure. Incorporating clinical measures and robust analytic approaches is essential for valid estimates in comparative effectiveness research.

2025-11-03 — Evaluating the impact of stem cell transplantation in myelodysplastic syndrome using machine learning and comparative modeling

Authors: Ziyi Li, K. Chien, Yue Lyu, K. Sasaki, I. Bouligny, Yue Wei, G. Montalban-Bravo, S. Loghavi, A. Bataller, U. Popat, H. Kantarjian, A. Bazinet, G. Garcia-Manero
Year: 2025
Publication Date: 2025-11-03
Venue: Blood
DOI: 10.1182/blood-2025-5644
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Background: Most patients with myelodysplastic syndromes (MDS) experience incomplete and short-lived responses to hypomethylating agents (HMAs), the current first-line therapy for higher-risk MDS. Loss of response and disease progression typically lead to poor survival outcomes, with median survival of just 4–6 months. Allogeneic stem cell transplantation (SCT) offers the potential to significantly extend survival, even after HMA failure, but it carries substantial risks and complications. Accurate prediction of survival and treatment outcomes is therefore essential for timely, informed decision-making between HMA and SCT. Existing risk stratification tools such as the Revised International Prognostic Scoring System (IPSS-R) and the Molecular IPSS (IPSS-M) classify patients into broad risk categories but fall short in providing personalized survival estimates. Moreover, these systems rely on linear models, whereas modern machine learning methods can capture both linear and nonlinear effects, offering more precise predictions. Methods: To address these gaps, we applied the Cox proportional hazards model alongside four machine learning approaches—Random Survival Forest (RSF), Gradient Boosting for Survival (GBM), eXtreme Gradient Boosting (XGBoost), and Ensemble Super Learner (SL)—to generate individualized survival predictions for MDS patients. We compared predicted versus observed survival in both SCT and non-SCT cohorts and further explored patient characteristics associated with longer- or shorter-than-expected survival among those who underwent SCT. Results: We trained Cox proportional hazards and four machine learning models using a cohort of 814 SCT-naïve MDS patients. Covariates included age, peripheral blood counts, bone marrow blasts, cytogenetic risk scores, and 15 common gene mutation statuses. The trained models were then independently applied to two non-overlapping testing cohorts: 555 patients who received SCT and 300 additional SCT-naïve patients. Compared to SCT-naïve patients, those who received SCT were significantly younger (median age 58 vs. 71 years, p < 0.001), had higher IPSS-R scores (5.5 vs. 4.0, p < 0.001), higher hemoglobin levels (9.7 vs. 9.2, p = 0.017), lower platelet counts (74 vs. 98, p = 0.001), and higher bone marrow blast percentages (6.0% vs. 3.0%, p < 0.001). The median time from diagnosis to SCT was 7 months (Interquartile range: 5–13). In the independent testing cohort of SCT-naïve patients, machine learning models outperformed the Cox model in survival prediction (C-index: 0.81 for RSF, 0.811 for GBM, 0.81 for XGBoost, 0.815 for Super Learner, vs. 0.776 for Cox). However, predictive accuracy was lower in the SCT cohort (C-index range: 0.679–0.696 across models), suggesting that SCT substantially alters patients' survival trajectories. Using predictions from the SL model, we classified patients who lived ≥3 months longer than predicted as “good” SCT responders (n = 266) and those who died ≥3 months earlier as “poor” responders (n = 132). A multivariable logistic regression model incorporating age, hemoglobin, platelet count, and cytogenetics score revealed that higher hemoglobin levels were significantly associated with lower odds of a favorable SCT response (odds ratio [OR] 0.79; 95% confidence interval [CI]: 0.69–0.89; p = 0.00019). Older age (OR per 10-year increase: 0.80; 95% CI: 0.66–0.97; p = 0.033), higher platelet count (OR per 10-unit increase: 0.98; 95% CI: 0.96–0.99; p = 0.0145), and lower cytogenetics risk scores (OR: 1.19; 95% CI: 1.00–1.43; p = 0.046) were also associated with decreased odds of SCT benefit. Additional clinical variables, including Absolute Neutrophil Count (ANC), bone marrow (BM) blast percentage, and TP53 mutation status, were evaluated in univariate analysis but did not reach statistical significance and were therefore excluded from the multivariable model (ANC: p = 0.59; BM blasts: p = 0.09; TP53: p = 0.26).Conclusions: This study presents the first systematic application of modern machine learning models to compare SCT and non-SCT outcomes in MDS. Our findings confirm that SCT can significantly alter patient survival—either prolonging or shortening life—and highlight key patient characteristics associated with treatment response. These results underscore the need for larger, stratified studies to better guide patient selection and optimize SCT o

2025-11-02 — From Path Coefficients to Targeted Estimands: A Comparison of Structural Equation Models (SEM) and Targeted Maximum Likelihood Estimation (TMLE)

Authors: Junjie Ma, Xiaoya Zhang, Guangye He, Yuting Han, Ting Ge, Feng Ji
Year: 2025
Publication Date: 2025-11-02
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Structural Equation Modeling (SEM) has gained popularity in the social sciences and causal inference due to its flexibility in modeling complex relationships between variables and its availability in modern statistical software. To move beyond the parametric assumptions of SEM, this paper reviews targeted maximum likelihood estimation (TMLE), a doubly robust, machine learning-based approach that builds on nonparametric SEM. We demonstrate that both TMLE and SEM can be used to estimate standard causal effects and show that TMLE is robust to model misspecification. We conducted simulation studies under both correct and misspecified model conditions, implementing SEM and TMLE to estimate these causal effects. The simulations confirm that TMLE consistently outperforms SEM under misspecification in terms of bias, mean squared error, and the validity of confidence intervals. We applied both approaches to a real-world dataset to analyze the mediation effects of poverty on access to high school, revealing that the direct effect is no longer significant under TMLE, whereas SEM indicates significance. We conclude with practical guidance on using SEM and TMLE in light of recent developments in targeted learning for causal inference.

2025-11-01 — Wear prediction of harmonic gears using a data-driven method based on stacking approach

Authors: Jiangkai Feng, Tao He, Ziyang Gong
Year: 2025
Publication Date: 2025-11-01
Venue: International Conference on New Materials, Machinery, and Vehicle Engineering
DOI: 10.1117/12.3084990
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Harmonic gear represents a critical component of transmission systems and is susceptible to progressive surface wear under prolonged exposure to complex operating conditions, which leads to lubrication failure and even systemic performance degradation. However, some conventional simulation methods employed in harmonic gear wear analysis exhibit suboptimal computational efficiency and time-consuming characteristics. To address these challenges, this study presents a data-driven approach for wear prediction in harmonic gears. This paper initially establishes a harmonic gear wear dataset, which is based on the three-dimensional mixed elastohydrodynamic lubrication and Archard models that were previously proposed. The dataset considers the influence of tooth modification parameters on wear resistance. Then, a Meta-Feature-Extended (MFE) stacking model is developed for wear prediction in harmonic gears, utilizing several classic machine learning algorithms and the stacking approach. The MFE stacking model’s predictive capacity is validated on the proposed dataset through comparative experiments with six classic machine learning models, the standard stacking model, and the super learner. The proposed method demonstrates satisfactory predictive performance on the testing dataset, thereby providing a theoretical foundation for the design optimization of harmonic gears aimed at wear resistance and lifespan prolonging.

2025-11-01 — Vaping nicotine is associated with increased medically-attended COVID-19 in women of reproductive age in an integrated health system

Authors: J. Velotta, Ilya Moskalenko, Shiyun Zhu, Sara R. Adams, J. Nugent, Tyler Chervo, Jacek Skarbinski, Judith J Prochaska, Qiana L Brown, Cynthia I. Campbell, Aurash J. Soroosh, Monique B. Does, K. Young-Wolff
Year: 2025
Publication Date: 2025-11-01
Venue: Journal of Thoracic Disease
DOI: 10.21037/jtd-2025-580
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Background Vaping in adolescents and young adults is increasingly common and has been shown to cause deleterious effects. Little is known about the effects of vaping on risk of coronavirus disease 2019 (COVID-19) among women of reproductive age. This study evaluated whether vaping nicotine and/or cannabis during the year before pregnancy was associated with medically-attended COVID-19 episodes. Methods This large multicenter cross-sectional retrospective study evaluated women universally screened for vaping during the year before pregnancy as part of standard prenatal care from 9/1/2021 to 3/31/2023. Data came from the electronic health record and included nicotine and/or cannabis vaping during the year before pregnancy (exposure), medically-attended COVID-19 episode during the year before pregnancy (outcome), current and 5-year-history of tobacco smoking status, age, race/ethnicity, neighborhood deprivation index, body mass index, parity and Elixhauser Comorbidity Score. Associations between vaping and medically-attended COVID-19 episodes were estimated using Targeted Maximum Likelihood Estimation (TMLE) adjusting for covariates. Sensitivity analyses were performed after excluding women who had a history of current/former tobacco smoking. Results The sample of 71,508 reproductive-aged women had a mean (standard deviation) age of 31.7 (5.2) years and 67.6% were non-White. Overall, 2,347 (3.3%) reported vaping nicotine and 3,505 (4.9%) reported vaping cannabis during the year before pregnancy (2.47% vaped nicotine only, 4.10% vaped cannabis only, and 0.81% vaped both). The prevalence of having a medically-attended COVID-19 episode was higher among those who vaped vs. did not vape nicotine (16.9% vs. 14.1%) and among those who vaped nicotine only (17.0%) or nicotine and cannabis (16.8%) vs. neither (14.1%). In the adjusted analyses, the prevalence of a medically-attended COVID-19 episode was greater among those who vaped nicotine (vs. no nicotine vaping) [adjusted prevalence ratio (aPR) =1.33, 95% confidence interval (CI): 1.16–1.53] and among those who vaped nicotine only (aPR =1.32 (05% CI: 1.14–1.52) or both nicotine and cannabis (aPR =1.40, 95% CI: 1.28–1.54) vs. those who did not vape. Vaping cannabis was not associated with medically-attended COVID-19 episode risk. Conclusions Vaping nicotine only or in combination with cannabis was positively associated with medically-attended COVID-19 episodes among women during the year prior to pregnancy. Future research is needed to understand the mechanisms underlying this association.

2025-11-01 — Predictions of Small Intracranial Aneurysms’ Rupture Risk with Ensemble Machine Learning Model (Super Learner): A Retrospective Study in Two Tertiary Hospitals in China

Authors: Xiaolong Hu, Shifei Ye, Dayong Qi, Suya Li, Xiaoyu Tang, Yibin Fang
Year: 2025
Publication Date: 2025-11-01
Venue: International Journal of General Medicine
DOI: 10.2147/IJGM.S533558
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Purpose This research aims to investigate the morphological, clinical and hemodynamic parameters influencing intracranial aneurysm rupture, develop a ensemble machine learning model (Super Learner) to predict its rupture risk. Methods This retrospective study analyzed aneurysm patients from two hospitals. Five base learners—decision tree (DT), k-nearest neighbor (KNN), random forest (RF), support vector machine (SVM) and extreme gradient boosting (XGBoost)—were constructed based on demographic, hemodynamic and geometric parameters. A Super Learner model was subsequently constructed using ensemble learning algorithms, with all models validated on an independent external dataset. Results The dataset comprised 97 patients in the training cohort, 42 in the internal validation cohort, and 86 in the external validation cohort. In the external validation cohort, the area under the curve (AUC) values: Super learner 0.94 (0.89–1.00), Random Forest 0.83 (0.76–0.91), XGBoost 0.93 (0.87–0.99), Support Vector Machine 0.82 (0.73–0.92), Decision Tree 0.84 (0.76–0.93), and k-Nearest Neighbors 0.51 (0.38–0.63). Conclusion The Super Learner model outperforms individual models in both performance and stability for predicting intracranial aneurysm rupture risk. It has been robustly validated on an external dataset, demonstrating strong generalizability.

2025-11-01 — Lower rate of ischemic events with alirocumab in patients with atherosclerotic cardiovascular disease but without prior acute ischemic events

Authors: D. Bhatt, L. Tokgozoglu, N. Marx, I. Khan, M. C. Cabezas, J. Tunon, M. Vrablik, P. P. Filardi, S. Gruber, M. Van der Laan, K. Andrade, Z. Lou, K. Pol, K. Lee, K. Ray
Year: 2025
Publication Date: 2025-11-01
Venue: European Heart Journal
DOI: 10.1093/eurheartj/ehaf784.4289
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
ODYSSEY OUTCOMES showed a significant reduction in cardiovascular events, including mortality, in patients with recent acute coronary syndromes, randomized to alirocumab vs. placebo, for lowering low-density lipoprotein cholesterol (LDL-C). However, the trial did not evaluate patients without a prior history of acute events. To evaluate the potential benefit of alirocumab on acute ischemic events and mortality in patients with atherosclerotic cardiovascular disease (ASCVD) but without prior history of myocardial infarction or stroke. Patients with ASCVD but without prior acute ischemic events were identified from the Optum Research Database (January 2016–December 2022). Patients initiating alirocumab were matched 1:2 with no-PCSK9i (mAb or siRNA) users based on a propensity score that utilized patients’ baseline characteristics. The primary endpoint was the composite of nonfatal myocardial infarction, nonfatal ischemic stroke, and all-cause mortality. The causal estimands were defined as the relative risk reduction (RRR) and absolute risk reduction (ARR) at 3 years with a per-protocol analysis, meaning sustained treatment over time. Rigorous statistical methods for causal inference were utilized to try to obtain unbiased findings. Targeted maximum likelihood estimation (TMLE) was the primary analysis; parametric G-formula was the secondary analysis. The confounder set in these analyses included all baseline characteristics. A total of 3,301 patients receiving alirocumab 75mg or 150 mg and 6,602 matched no-PCSK9i users were included. Baseline characteristics were well-balanced. Mean age was 66.8 years; 45.9% were male; 69.4%, 7.6%, and 16.3% had coronary, cerebrovascular, or peripheral artery disease, respectively; 30.5% had diabetes and 8.5% had chronic kidney disease stage 3 or higher. Mean LDL-C was 129.4 mg/dL (3.3 mmol/L) and baseline statin use was 52.2%. TMLE-generated survival curves demonstrated a 3-year event rate of 7.3% (95% confidence interval [CI]: 5.2%–9.4%) for alirocumab and 15.7% (95% CI: 14.2%–17.1%) for the no-PCSK9i group, resulting in an RRR of 53.3% (p<0.001) and an ARR of 8.3% (p<0.001) (Figure 1). With the parametric G-formula, the 3-year event rate was 7.8% (95% CI: 5.9%–9.1%) for alirocumab and 15.4% (95% CI: 14.2%–16.5%) for the no-PCSK9i group, resulting in an RRR of 49.3% (p<0.001) and an ARR of 7.6% (p<0.001) (Figure 2). Alirocumab therapy in ASCVD patients without prior acute ischemic events was associated with a significantly lower rate of the composite of non-fatal ischemic events or mortality.

2025-11-01 — Evaluation of Methods Adjusting for Unmeasured Confounding Using Large Healthcare Databases: An Empirical Study Concerning Drugs Inducing Prematurity

Authors: Chi-Hong Duong, Sylvie Escolano, Romain Demailly, Anne C M Thiébaut, J. Cottenet, Catherine Quantin, Pascale Tubert-Bitter, Ismaïl Ahmed
Year: 2025
Publication Date: 2025-11-01
Venue: Clinical and Translational Science
DOI: 10.1111/cts.70394
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
With the growing availability of large healthcare databases for clinical science, mitigating unmeasured confounding has emerged as a major issue in pharmacoepidemiologic studies. Extensions of causal inference methods to high‐dimensional settings could help address this problem, but studies comparing their performance in real‐world databases are still lacking. This study aims to compare the ability to reduce the measured and indirectly measured confounding of three causal inference methods adapted to a real‐world high‐dimensional database using a machine learning LASSO algorithm: G‐computation (GC), Targeted Maximum Likelihood estimation (TMLE) and Propensity Score with overlap or stabilized inverse probability treatment weighting. This large‐scale empirical study was based on the French National Healthcare Claims Database (SNDS), consisting of 2,172,702 pregnancies ≥$$ \ge $$ 22 weeks of gestation over the period 2011–2014. We used a set of 42 negative and 13 positive reference drugs related to prematurity risk. For each reference drug, the logarithm of the odds ratio for prematurity and its 95% confidence interval were estimated using each method. The proportions of false positive and true positive associations were calculated and compared between the methods. All methods yielded fewer false positives than a crude model based on a minimal set of adjusted covariates. TMLE produced the lowest proportion of false positives (45.2%), followed by GC (47.6%). GC yielded the highest proportion of true positives (92.3%). Our results confirm the interest of causal inference methods exploiting the wealth of data in healthcare databases, especially GC in terms of performance and ease of implementation.

2025-11-01 — Estimation of Interventional Effects in Clinical Trials With a Repeatedly Measured Mediator

Authors: Marie Skov Breum, Mads Sundby Palle
Year: 2025
Publication Date: 2025-11-01
Venue: Statistics in Medicine
DOI: 10.1002/sim.70305
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Causal mediation analysis is an important tool in medical research, enabling researchers to examine and quantify the mechanisms through which a treatment exerts its effects on an outcome of interest. In many studies, the potential mediator is measured repeatedly over time, and we expect feedback between other post‐treatment covariates and the mediator. Randomized interventional (also called stochastic) effects have been proposed for defining causal mediation estimands in longitudinal settings, because they are identifiable in the presence of post‐treatment mediator‐outcome confounders. In this article, we use the interventional effects framework to investigate the mediating pathways in a clinical trial that compared the effect of semaglutide versus placebo on the histological resolution of nonalcoholic steatohepatitis (NASH). We define and identify various interventional direct and indirect effects relevant to the research question. For estimation, we propose an efficient and multiply robust Targeted Minimum Loss Estimator (TMLE), and we showcase the theoretical properties of the estimator in a simulation study.

2025-11-01 — Cardiovascular outcomes in sustained use of glucagon like peptide inhibitors: a trial inspired population vs. a broad population

Authors: K. Soerensen, P. Yazdanfard, U. Pedersen-Bjergaard, C. Torp-Pedersen
Year: 2025
Publication Date: 2025-11-01
Venue: European Heart Journal
DOI: 10.1093/eurheartj/ehaf784.4316
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
In type 2 diabetes, cardiovascular outcome trials have significantly influenced clinical practice, yet less than half of the patients receiving these therapies in real-world settings would have met the inclusion criteria for these trials. To apply a target trial emulation framework to evaluate the impact of sustained treatment with glucagon-like peptide-1 (GLP1-RA) versus dipeptidyl peptidase-4 inhibitor (DPP4i) on major adverse cardiovascular events (MACE) over three years in (1) a population adhering to trial-like inclusion and exclusion criteria and (2) a broader real-world population receiving these therapies, excluding the trial-inspired population. Using Danish nationwide registries, we identified new users of GLP1-RA or DPP4i for type 2 diabetes between 2012 and 2022. The trial-inspired population included patients aged 50 years or older with at least one of the following: heart failure, ischemic heart disease, stroke, or chronic kidney disease. Alternatively, patients aged 60 or older were eligible if they had hypertension or chronic obstructive pulmonary disease. Additionally, all patients were required to have an HbA1c of at least 53 mmol/mol in the year preceding treatment initiation. Not including the trial-inspired population, the broad population was defined by the same HbA1c threshold but included all patients prescribed GLP1-RA or DPP4i as second-line or add-on glucose-lowering therapy without further restrictions. The primary outcome was a composite of cardiovascular death, myocardial infarction, or stroke (MACE). Patients were followed for up to three years or until the occurrence of MACE, all-cause mortality, emigration, three-year follow-up completion, or December 31, 2022. A causal inference approach using longitudinal targeted maximum likelihood estimation was applied to estimate the average treatment effect incorporating information on pre-and post-baseline confounders. In the trial-inspired population, 6,619 patients initiated GLP1-RA, while 19,211 initiated DPP4i. Sustained GLP1-RA use was associated with an absolute 3-year MACE risk of 8.2% (95% CI: 6.6% to 9.8%) and an absolute risk reduction of 1.8% (95% CI: 0.0% to 3.5%) compared to DPP4i. In the broad population, 10,872 patients initiated GLP1-RA, and 16,743 initiated DPP4i. Here, sustained GLP1-RA use was associated with an absolute 3-year MACE risk of 5.3% (95% CI: 4.3% to 6.4%) and an absolute risk reduction of 0.1% (95% CI: -0.3% to 0.2%) compared to DPP4i. The cardiovascular benefits of GLP1-RA observed in clinical trials appear to extend to real-world patients who meet similar eligibility criteria. However, in a broad, less-selected population these benefits are negligible. This suggests that patient characteristics and real-world treatment factors may influence the magnitude of cardiovascular risk reduction.

2025-10-30 — A Unified Theory for Causal Inference: Direct Debiased Machine Learning via Bregman-Riesz Regression

Authors: Masahiro Kato
Year: 2025
Publication Date: 2025-10-30
Venue: arXiv.org
DOI: 10.48550/arXiv.2510.26783
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
This note introduces a unified theory for causal inference that integrates Riesz regression, covariate balancing, density-ratio estimation (DRE), targeted maximum likelihood estimation (TMLE), and the matching estimator in average treatment effect (ATE) estimation. In ATE estimation, the balancing weights and the regression functions of the outcome play important roles, where the balancing weights are referred to as the Riesz representer, bias-correction term, and clever covariates, depending on the context. Riesz regression, covariate balancing, DRE, and the matching estimator are methods for estimating the balancing weights, where Riesz regression is essentially equivalent to DRE in the ATE context, the matching estimator is a special case of DRE, and DRE is in a dual relationship with covariate balancing. TMLE is a method for constructing regression function estimators such that the leading bias term becomes zero. Nearest Neighbor Matching is equivalent to Least Squares Density Ratio Estimation and Riesz Regression.

2025-10-28 — Machine Learning for Causal Inference in Hospital Diabetes Care: TMLE Analysis of Selection Bias in Diabetic Foot Infection Treatment—A Cautionary Tale

Authors: Rim Hur, Robert Rushakoff
Year: 2025
Publication Date: 2025-10-28
Venue: Diabetology
DOI: 10.3390/diabetology6110122
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Background/Objectives: Diabetic foot infections (DFIs) are a leading cause of hospitalization, amputation, and costs among patients with diabetes. Although early treatment is assumed to reduce complications, its real-world effects remain uncertain. We applied a causal machine-learning (ML) approach to investigate whether early DFI treatment improves hospitalization and clinical outcomes. Methods: We conducted an observational study using de-identified UCSF electronic health record (EHR) data from 1434 adults with DFI (2015–2024). Early treatment (<3 days after diagnosis) was compared to delayed/no treatment (≥3 days or none). Outcomes included DFI-related hospitalization and lower-extremity amputation (LEA). Confounders included demographics, comorbidities, antidiabetic medication use, and laboratory values. We applied Targeted Maximum Likelihood Estimation (TMLE) with SuperLearner, a machine-learning ensemble. Results: Early treatment was associated with higher hospitalization risk (TMLE risk difference [RD]: 0.293; 95% CI: 0.220–0.367), reflecting the triage of clinically sicker patients. In contrast, early treatment showed a protective trend against amputation (TMLE RD: −0.040; 95% CI: −0.098 to 0.066). Results were consistent across estimation methods and robust to bootstrap validation. A major limitation is that many patients likely received treatment outside UCSF, introducing uncertainty around exposure classification. Conclusions: Early treatment of DFIs increased hospitalization but reduced amputation risk, a paradox reflecting appropriate clinical triage and systematic exposure misclassification from fragmented healthcare records. Providers prioritize the sickest patients for early intervention, leading to greater short-term utilization but potentially preventing irreversible complications. These findings highlight a cautionary tale; even with causal ML, single-institution analyses may misrepresent treatment effects, underscoring the need for causally informed decision support and unified EHR data.

2025-10-28 — A Cluster Randomized Trial Evaluating a Care Model Adapted to People Living with HIV in Senegal

Authors: Assane Diouf, Aminata Massaly-Ndiaye, N. Ngom, S. Sow, Ndeye Bineta Ndiaye-Coulibaly, Louise Fortes, N. M. Dia-Badiane, Safiatou Thiam, Moussa Seydi
Year: 2025
Publication Date: 2025-10-28
Venue: Asian Journal of Research in Infectious Diseases
DOI: 10.9734/ajrid/2025/v16i11500
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Aims: To compare the retention rate of the ART-stable (on ART for ≥ 6 months with a stable condition) people living with HIV (PLHIV) between a care model adapted to PLHIVs (CMAP) combining task shifting and differentiated follow-up (intervention) .and the standard of care (control). Study Design: Cluster randomized trial with both quantitative and qualitative components. Place and Duration of Study: Between July 2017 and July 2019 in the 12 health districts of Saint and Tambacouda regions, Senegal. Methodology: We included 1014 PLHIVs (429 in the intervention arm, 585 in the control arm). The mean age was 40.6 ± 13 years; 72% were female, 39.7% at WHO clinical stages 3-4. Arms were compared using Targeted Maximum Likelihood Estimation accounting for clustering. A socio-anthropological survey was carried out among caregivers and PLHIVs through focus group discussions and interviews to elicit the perceptions on the CMAP. The interviews were subjected to thematic analysis with Atlas Ti Results: After 18 months of follow-up, the retention rate was 94.4%(95% CI; = 93.8-96.2) in the CMAP arm versus 92.8% [95% CI = 90.2%-93.7%] in the control arm. The duration of the trip to the health facility (26 minutes vs 68 minutes; p <0.01), the transport costs (1 US$ vs 6 US$; p < 0.01) and the time spent in the health facility (31 minutes vs 89 minutes; p < 0.01) were lower in the intervention arm. Six (6) focus groups and 25 interviews involving 42 caregivers and 28 PLHIVs were conducted in the CMAP arm. The qualitative analyses revealed that caregivers and PLHIVs were supportive of CMAP. Conclusion: The CMAP was associated with increased retention, shorter travel time and decreased cost in HIV care

2025-10-27 — Estimating the Causal Effect of Realistic Treatment Strategies Using Longitudinal Observational Data

Authors: Yingying Zhang, Alastair Bennett, Andrea Manca, Moshe Mittelman, Marlijn P A Hoeks, Alex Smith, Adele Taylor, Reinhard Stauder, Theo de Witte, L. Malcovati, C. V. van Marrewijk, Noémi Kreif
Year: 2025
Publication Date: 2025-10-27
Venue: Medical decision making
DOI: 10.1177/0272989X251379819
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Background Real-world data can inform health care decisions by allowing the evaluation of nuanced treatment strategies. Longitudinal observational data enable the assessment of dynamic treatment regimes (DTRs), strategies that adapt treatment over time based on patient history, but require causal inference methods to address time-varying confounding. Longitudinal targeted minimum loss-based estimation (LTMLE) is a machine learning–based double-robust approach for improved causal effect estimation. Methods We applied LTMLE to longitudinal registry data to evaluate the impact of erythropoiesis-stimulating agents (ESAs) in the clinical management of low to intermediate-1 risk myelodysplastic syndrome (MDS). We defined DTRs based on clinically relevant decision rules (e.g., commencing treatment when the hemoglobin level falls below a threshold) and compared them to static treatment regimes (always or never giving ESAs). Outcomes include mortality and health-related quality of life measured by EQ-5D scores. Results The static regime of never administering ESAs resulted in declining counterfactual EQ-5D scores and increasing mortality risk over time. In contrast, both the static regime of continuous administration of ESAs and the use of dynamic regimes improved the EQ-5D scores and tended to reduce mortality, although the mortality differences were not statistically significant. Conclusions The article provides a case study application of the LTMLE method to evaluate realistic treatment policies under time-varying confounding. The findings support the potential benefits of dynamic treatment strategies for the management of MDS, highlighting the importance of personalized treatment adaptation. The study contributes methodological insights into the applications of LTMLE in small-sample, long-follow-up settings relevant to health technology assessment and policy making. Highlights This study applies the longitudinal targeted minimum loss estimation (LTMLE) method to evaluate the causal effect of static and dynamic treatment strategies using longitudinal observational data. We demonstrate the use of the LTMLE method to assess the impact of erythropoiesis stimulating agents (ESAs) on quality of life and mortality in patients with low to intermediate-1 risk myelodysplastic syndromes. The findings suggest that patients treated under dynamic ESA treatment regimes show an improved quality of life measured by EQ-5D scores and survival compared with those treated under the static treatment regime of never administering ESAs. This study contributes to the methodological literature by showcasing the application of the LTMLE method in a small-sample, long-follow-up setting with time-varying confounding, informing health technology assessment and policy decisions.

2025-10-27 — Direct Debiased Machine Learning via Bregman Divergence Minimization

Authors: Masahiro Kato
Year: 2025
Publication Date: 2025-10-27
Venue: arXiv.org
DOI: 10.48550/arXiv.2510.23534
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
We develop a direct debiased machine learning framework comprising Neyman targeted estimation and generalized Riesz regression. Our framework unifies Riesz regression for automatic debiased machine learning, covariate balancing, targeted maximum likelihood estimation (TMLE), and density-ratio estimation. In many problems involving causal effects or structural models, the parameters of interest depend on regression functions. Plugging regression functions estimated by machine learning methods into the identifying equations can yield poor performance because of first-stage bias. To reduce such bias, debiased machine learning employs Neyman orthogonal estimating equations. Debiased machine learning typically requires estimation of the Riesz representer and the regression function. For this problem, we develop a direct debiased machine learning framework with an end-to-end algorithm. We formulate estimation of the nuisance parameters, the regression function and the Riesz representer, as minimizing the discrepancy between Neyman orthogonal scores computed with known and unknown nuisance parameters, which we refer to as Neyman targeted estimation. Neyman targeted estimation includes Riesz representer estimation, and we measure discrepancies using the Bregman divergence. The Bregman divergence encompasses various loss functions as special cases, where the squared loss yields Riesz regression and the Kullback-Leibler divergence yields entropy balancing. We refer to this Riesz representer estimation as generalized Riesz regression. Neyman targeted estimation also yields TMLE as a special case for regression function estimation. Furthermore, for specific pairs of models and Riesz representer estimation methods, we can automatically obtain the covariate balancing property without explicitly solving the covariate balancing objective.

2025-10-25 — Protecting Mobile Communication Channels: Adaptive Super Learner-Based Malware Detection for Android

Authors: Noureldin H. Radwan, Saad M. Ibrahim, Reem Essameldin
Year: 2025
Publication Date: 2025-10-25
Venue: Novel Intelligent and Leading Emerging Sciences Conference
DOI: 10.1109/NILES68063.2025.11232001
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Securing mobile communication channels is increasingly vital due to the growing dependence on Android applications, which are common targets for sophisticated malware. These threats can compromise user privacy, damage system files, and even harm device hardware. The open nature of Android and its large user base amplify its vulnerability. Traditional detection methods often fail to identify new and obfuscated malware variants, such as polymorphic and zero-day attacks. While machine learning offers improvements, many models lack scalability and adaptability to evolving threats. To overcome these challenges, this study introduces an Adaptive Super Learner-Based Malware Detection Model tailored for Android systems. The model combines optimized base learners—Decision Tree (DT), K-Nearest Neighbors (KNN), Multi-Layer Perceptron (MLP), and LightGBM (LGBM)—with Logistic Regression as a meta-learner, enabling dynamic response to emerging malware behaviors. Genetic Algorithm (GA) is employed to select the most relevant features, reducing the dimensionality of the input space while enhancing performance. Evaluated on the Drebin dataset, the model achieved a high accuracy of 99.6% using the full feature set, and 99.3% accuracy after reducing features from 215 to 97—demonstrating strong detection capability, adaptability, and efficiency in securing Android-based communication platforms.

2025-10-25 — Causal Effect Estimation with TMLE: Handling Missing Data and Near-Violations of Positivity

Authors: Christoph Wiederkehr, Christian Heumann, Michael L. Stein Dept. of Statistics, Ludwig-Maximilian-University Munich, Centre for Integrated Data, Epidemiological Research, C. Town, Institute of Health, Medical Decision Making, Health Technology Assessment, U. -. U. F. H. Sciences, Medical Informatics, Technology, Hall in Tirol, Munich Center for Machine Learning
Year: 2025
Publication Date: 2025-10-25
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso, targeted maximum likelihood estimation, tmle

Abstract:
We evaluate the performance of targeted maximum likelihood estimation (TMLE) for estimating the average treatment effect in missing data scenarios under varying levels of positivity violations. We employ model- and design-based simulations, with the latter using undersmoothed highly adaptive lasso on the'WASH Benefits Bangladesh'dataset to mimic real-world complexities. Five missingness-directed acyclic graphs are considered, capturing common missing data mechanisms in epidemiological research, particularly in one-point exposure studies. These mechanisms include also not-at-random missingness in the exposure, outcome, and confounders. We compare eight missing data methods in conjunction with TMLE as the analysis method, distinguishing between non-multiple imputation (non-MI) and multiple imputation (MI) approaches. The MI approaches use both parametric and machine-learning models. Results show that non-MI methods, particularly complete cases with TMLE incorporating an outcome-missingness model, exhibit lower bias compared to all other evaluated missing data methods and greater robustness against positivity violations across. In Comparison MI with classification and regression trees (CART) achieve lower root mean squared error, while often maintaining nominal coverage rates. Our findings highlight the trade-offs between bias and coverage, and we recommend using complete cases with TMLE incorporating an outcome-missingness model for bias reduction and MI CART when accurate confidence intervals are the priority.

2025-10-23 — A Multi-Component, Peer-Led Intervention for Pregnant and Post-Partum Women with HIV Improves Viral Suppression Over Time in Rural Uganda

Authors: Jane Kabami, L.B Balzer, Faith Kagoya, Jaffer Okiring, Joanita Nangendo, Emmanuel Ruhamyankaka, Peter Ssebutinde, E. Kakande, Elizabeth Arinitwe, John Bosco Tamu Munezeo, Valence Mfitumukiza, Stella Kabageni, Michael Ayebare, Anne Ruhweza Katahoire, Phillippa Musoke, M. Kamya
Year: 2025
Publication Date: 2025-10-23
Venue: AJE Advances: Research in Epidemiology
DOI: 10.1093/ajeadv/uuaf017
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation, tmle

Abstract:
Achieving viral suppression among pregnant and breastfeeding women with HIV is essential to promoting their health and eliminating vertical transmission of HIV. We hypothesized that a multi-component and peer-led intervention would increase viral suppression among pregnant and breastfeeding women with HIV in rural Southwestern Uganda. The ENHANCED-SPS intervention included the development of a counseling protocol, point-of-care viral load testing, and standardized support delivered by peer-mothers. Among 505 pregnant and post-partum women receiving the ENHANCED-SPS intervention (2019-2021), we evaluated the change in viral suppression (HIV RNA< 1000 c/mL) from baseline to 12 months of follow-up with targeted minimum loss-based estimation (TMLE), accounting for clustering and missing outcomes. The proportion with viral suppression was 70.0% (95%CI: 65.9-74.1%) at baseline and 94.9% (95%CI: 92.5-97.4%) at 12 months, corresponding to a 24.9% (95%CI: 21.6-28.2%; p<0.001) absolute increase over time. Significant improvements over time were observed across age groups (15-24 years, 25-34 years, and 35+ years) and for both pregnant and post-partum women. Approximately 95% of women in all age groups and pregnant women achieved viral suppression at 12-months. However, post-partum women lagged behind with only 75.7% viral suppression at 12-months, despite a 58.9% (95%CI: 27.4-90.3%) increase from baseline. The multi-component, peer-led ENHANCED-SPS intervention resulted in meaningful improvements in viral suppression for pregnant and breastfeeding women with HIV; however, additional support is needed during the post-partum period.

2025-10-22 — CEILING EFFECT OF THE COMBINED NORWEGIAN AND DANISH KNEE LIGAMENT REGISTERS LIMITS ANTERIOR CRUCIATE LIGAMENT RECONSTRUCTION OUTCOME PREDICTION

Authors: R. K. Martin, S. Wastvedt, A. Pareek, A. Persson, H. Visnes, A. Fenstad, G. Moatshe, J. Wolfson, M. Lind
Year: 2025
Publication Date: 2025-10-22
Venue: Orthopaedic Proceedings
DOI: 10.1302/1358-992x.2025.10.017
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Clinical tools based on machine learning analysis now exist for outcome prediction following primary anterior cruciate ligament reconstruction (ACLR). Relying partly on data volume, a general principle is that more data may lead to improved model accuracy. The purpose of this study was to apply machine learning to a combined dataset comprised of the Norwegian and Danish knee ligament registers (KLR) with the aim of producing an algorithm that can predict subsequent revision surgery with improved accuracy relative to a previously published model developed using only the Norwegian register. The hypothesis was that the additional patient data would result in an algorithm that is more accurate. Machine learning analysis was performed on the combined KLR. The primary outcome was the probability of revision ACL reconstruction within 1, 2, and 5 years. Data was split randomly into training sets (75%) and test sets (25%). Four machine learning models intended for this type of data were tested: Cox Lasso, survival random forest, gradient boosted regression (GBM), and super learner. Model performance was evaluated by calculating concordance and calibration using methods adapted for censored data. Concordance measures the proportion of pairs of observations in which predicted ranking of survival probabilities corresponds to actual ranking. Calibration is a measure of the accuracy of predicted probabilities that compares expected to actual outcomes. The combined registry population consisted of 62,955 patients and 5.1% of patients underwent revision surgery during a mean follow-up time of 7.6 ± 4.5 years. The best performing models had concordance in the moderate range (0.67, 95% CI 0.64–0.70) and were well calibrated at all follow-up times. Multiply imputed data did not show notable differences from the complete case analysis. Machine learning analysis of the combined registers enabled the prediction of subsequent revision surgery risk after primary ACLR with moderate accuracy. However, the most important finding of this study was that this analysis of nearly 63,000 patients yielded similar prediction accuracy as a previous study of approximately 25,000 patients. This suggests a so-called ceiling effect of the registers has been reached and that simply adding more patients to the database is unlikely to appreciably improve prediction accuracy. For an improvement in the ability to predict outcome based on knee ligament register data, an evolution in the variables collected is required. This represents a significant challenge as the balance between optimal variable collection and surgeon compliance is a delicate one; data collection must be streamlined to avoid survey fatigue and the addition of variables to the registry must be carefully considered, weighing the added value against the additional burden on the surgeons which may compromise compliance. Less onerous methods of improving data collection such as natural language processing may hold the key to improving outcome prediction using the knee ligament registers.

2025-10-19 — Causal inference for calibrated scaling interventions on time-to-event processes

Authors: H. Rytgaard, M. V. D. Laan
Year: 2025
Publication Date: 2025-10-19
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
This work develops a flexible inferential framework for nonparametric causal inference in time-to-event settings, based on stochastic interventions defined through multiplicative scaling of the intensity governing an intermediate event process. These interventions induce a family of estimands indexed by a scalar parameter {\alpha}, representing effects of modifying event rates while preserving the temporal and covariate-dependent structure of the observed data generating mechanism. To enhance interpretability, we introduce calibrated interventions, where {\alpha} is chosen to achieve a pre-specified goal, such as a desired level of cumulative risk of the intermediate event, and define corresponding composite target parameters capturing the downstream effects on the outcome process. This yields clinically meaningful contrasts while avoiding unrealistic deterministic intervention regimes. Under a nonparametric model, we derive efficient influence curves for {\alpha}-indexed, calibrated, and composite target parameters and establish their double robustness properties. We further sketch a targeted maximum likelihood estimation (TMLE) strategy that accommodates flexible, machine learning based nuisance estimation. The proposed framework applies broadly to (causal) questions involving time-to-event treatments or mediators and is illustrated through different examples event-history settings. A simulation study demonstrates finite-sample inferential properties, and highlights the implications of practical positivity violations when interventions extend beyond observed data support.

2025-10-16 — Machine learning for propensity score estimation: A systematic review and reporting guidelines.

Authors: Walter L. Leite, Huibin Zhang, Zachary K. Collier, Kamal Chawla, Lingchen Kong, Yongseok Lee, Jia Quan, Olushola O Soyoye
Year: 2025
Publication Date: 2025-10-16
Venue: Psychological methods
DOI: 10.1037/met0000789
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Machine learning (ML) has become a common approach for estimating propensity scores (PSs) for quasi-experimental research using matching, weighting, or stratification on the PS. This systematic review examined 179 applications of ML for PS estimation across different fields, such as health, education, social sciences, and business over 40 years. The results show that the gradient boosting machine (GBM) is the most frequently used method, followed by random forest. Classification and regression trees, neural networks, and the super learner were also used in more than 5% of studies. The most frequently used packages to estimate PSs were twang, gbm, and randomforest in the R statistical software. The review identified that critical steps of the propensity score analysis are frequently underreported. Covariate balance evaluation was not reported by 48.04% of articles. Also, improper use of p values for covariate balance evaluation was identified in 13.97% of the studies. Only 22.8% of studies performed a sensitivity analysis. Many hyperparameter configurations were used for ML methods, but only 46.9% of studies reported the hyperparameters used. A set of guidelines for reporting the use of ML for PS estimation is provided. (PsycInfo Database Record (c) 2025 APA, all rights reserved).

2025-10-16 — Effect of juvenile justice financial sanctions on youths' recidivism.

Authors: Luyi Jian, Jennifer L. Skeem, Jaclyn E. Chambers
Year: 2025
Publication Date: 2025-10-16
Venue: Law and Human Behavior
DOI: 10.1037/lhb0000636
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
OBJECTIVES Advocacy for reforming financial sanctions (i.e., fees, fines, and restitution) in the juvenile justice system is growing, with a particular focus on eliminating fees. Although a key argument is that these sanctions may increase the likelihood of reoffending, studies that examine the link between financial sanctions and recidivism are limited and their results are mixed. Using administrative data on a large, diverse, and at-risk population of juvenile probationers in an urban county, we tested the impact that financial sanctions have overall-and that fees have specifically-on young people's risk of probation violations and rearrest over a 2-year period. HYPOTHESES We tentatively hypothesized that (a) financial sanctions overall would increase both probation violations and rearrests, and (b) fees alone would increase probation violations but not rearrests. METHOD We accessed, linked, and analyzed data from county and state agencies for a sample of 2,401 youth under supervision. We applied a rigorous causal inference approach (targeted maximum likelihood estimation) combined with machine learning to test the hypotheses. RESULTS Financial sanctions overall modestly increased the likelihood of both probation violations (from an estimated 9% to 14%) and rearrests (from an estimated 54% to 58%)-but fees alone did not significantly predict either outcome. The effects of financial sanctions on recidivism were not moderated by the youth's race, socioeconomic status, or cumulative risk. CONCLUSIONS Financial sanctions burden families but are weak risk factors for recidivism. If the goal is to prevent re-offending, reform efforts could focus on broader financial sanctions than just fees and prioritize more powerful levers like evidence-based programs and services. (PsycInfo Database Record (c) 2025 APA, all rights reserved).

2025-10-15 — postcard: An R Package for Marginal Effect Estimation with or without Prognostic Score Adjustment

Authors: Mathias Lerbech Jeppesen, Emilie Højbjerre-Frandsen
Year: 2025
Publication Date: 2025-10-15
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Covariate adjustment is a widely used technique in randomized clinical trials (RCTs) for improving the efficiency of treatment effect estimators. By adjusting for predictive baseline covariates, variance can be reduced, enhancing statistical precision and study power. Rosenblum and van der Laan [2010] use the framework of generalized linear models (GLMs) in a plug-in analysis to show efficiency gains using covariate adjustment for marginal effect estimation. Recently the use of prognostic scores as adjustment covariates has gained popularity. Schuler et al. [2022] introduce and validate the method for continuous endpoints using linear models. Building on this work H{\o}jbjerre-Frandsen et al. [2025] extends the method proposed by Schuler et al. [2022] to be used in combination with the GLM plug-in procedure [Rosenblum and van der Laan, 2010]. This method achieves semi-parametric efficiency under assumptions of additive treatment effects on the link scale. Additionally, H{\o}jbjerre-Frandsen et al. [2025] provide a formula for power approximation which is valid even under model misspecification, enabling realistic sample size estimation. This article introduces an R package, which implements the GLM plug-in method with or without PrOgnoSTic CovARiate aDjustment, postcard. The package has two core features: (1) estimating marginal effects and the variance hereof (with or without prognostic adjustment) and (2) approximating statistical power. Functionalities also include integration of the Discrete Super Learner for constructing prognostic scores and simulation capabilities for exploring the methods in practice. Through examples and simulations, we demonstrate postcard as a practical toolkit for statisticians.

2025-10-14 — Prevalence and factors associated with HIV drug resistance among adult persons living with HIV/AIDS in nine countries of Sub-Saharan Africa using population-based HIV impact assessments: 2015–2019

Authors: E. Nsonga, W. Ng’ambi, M. Goma, C. Zyambo
Year: 2025
Publication Date: 2025-10-14
Venue: BMC Public Health
DOI: 10.1186/s12889-025-24633-9
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
HIV drug resistance (HIVDR) remains a significant challenge in sub-Saharan Africa (SSA) due to limited effective Treatment and healthcare resources vary. Using the first widely available HIVDR surveillance data in SSA, we calculated the prevalence and associated factors of HIVDR amongst the persons that were on ART between 2015 and 2019 using the Population-based HIV Impact Assessment (PHIA). A secondary analysis of the combined PHIA HIVDR data from Cameroon, Malawi, Eswatini, Ethiopia, Namibia, Rwanda, Tanzania, Zambia and Zimbabwe over the 2015–2019 period. All the 1,008 persons with HIVDR information were included in the analysis. We calculated frequencies, proportions, 95% confidence intervals (95%CI), crude/adjusted odds ratios (cOR/aOR), chi-Square statistics using in R. HIVDR was determined through genotypic testing of blood samples from HIV positive individuals with detectable viral load. We examined the prevalence and associated factors of HIVDR in SSA. Standard assays were used to identify mutations linked to resistance to NRTIs, NNRTIs, and PIs. Presence of at least one mutation indicated HIVDR status. Supervised machine learning models were developed in RStudio using the SuperLearner and caret packages, training six algorithms to predict HIVDR. Variable importance was assessed using a random forest model, while predicted probabilities were Generated via Elastic Net LASSO regression with 10-fold cross-validation. Statistical significance was set at P < 0.05. An overall prevalence of HIVDR was 35%. Not reaching HIV viral load suppression, experiencing antiretroviral treatment, and certain sociodemographic characteristics including age (35 + years), living in a rural area, and particular national contexts (e.g., higher resistance in Rwanda and Zimbabwe) were important factors linked to higher HIVDR likelihood. Additionally, the study revealed that having viral load suppression was associated with lower HIVDR likelihood (aOR: 0.31, 95% CI: 0.21–0.45, P < 0.001), whereas experiencing antiretroviral treatment was associated with higher HIVDR likelihood (aOR: 2.6, 95% CI: 1.75–3.91, P < 0.001). Machine learning models confirmed that programmatic and contextual factors outweighed individual characteristics in shaping resistance risk. Predicted probabilities were highest among ART-experienced individuals with unsuppressed viral load, reaching up to 45%. While the LASSO model showed moderate accuracy, the Super Learner ensemble outperformed all models. This study concludes by highlighting the substantial prevalence of HIVDR in SSA, which varies significantly among nations and sociodemographic characteristics. The results highlight the importance of ART use and viral load suppression in determining HIVDR prevalence, underscoring the necessity of focused interventions to enhance viral load monitoring and ART adherence. To combat the growing threat of HIVDR and guarantee the long-term efficacy of HIV treatment programs in the area, ongoing surveillance and context-specific approaches are crucial.

2025-10-13 — Effects of projected increases in heat exposure on linguistic development in two-year-old children: A longitudinal modified treatment policy analysis

Authors: Guillaume Barbalat, A. Guilbert, Marie-Aline Charles, Ian Hough, L. Launay, Itai Kloog, J. Lepeule
Year: 2025
Publication Date: 2025-10-13
Venue: Environmental Epidemiology
DOI: 10.1097/EE9.0000000000000423
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Background: Previous studies have demonstrated that in utero and early life heat exposure can influence neurodevelopment. However, to our knowledge, these investigations have not evaluated realistic counterfactual scenarios; instead, they have primarily relied on static, crude comparisons of extreme temperatures versus a reference temperature over an extended period. Methods: We employed the framework of longitudinal modified treatment policy to examine the impact of heat exposure during prenatal and postnatal periods on the linguistic development of two-year-old children in the Etude Longitudinale Française depuis l’Enfance birth cohort (N = 12,163). Heat exposure was defined as the number of periods when overall daytime and nighttime daily temperatures surpassed the 90th percentile (20.6, 27.5, and 15.3 °C, respectively) for at least two consecutive days. Context-specific counterfactual scenarios were constructed by increasing daily temperatures by 1, 2, or 3 °C, in line with projections from Intergovernmental Panel on Climate Change scenarios. Causal effects were estimated by comparing the population mean outcomes under hypothetical counterfactual scenarios vs. those actually observed in the data using a doubly robust estimation technique (targeted maximum likelihood estimation). A library of machine learning algorithms was employed to model the intricate relationships between covariates and both the exposure and outcome variables. Results: In counterfactual scenarios where daily temperature increases by one degree, mean differences in log-transformed population outcome did not reach statistical significance. A two-degree daily increase in nighttime temperature showed a decrease in linguistic development scores of 30% (P < 0.001). A three-degree increase in overall, daytime and nighttime daily temperatures showed a decrease in scores of at least 6% (P < 0.003). Conclusion: Our study revealed a negative impact of increased air temperatures on the linguistic development of 2-year-old children in counterfactual scenarios involving two- and three-degree temperature rises. The longitudinal modified treatment policy approach offers valuable new insights for causal inference in environmental epidemiology, particularly through its ability to directly assess the effects of anticipated, policy-relevant temperature changes.

2025-10-13 — Childhood Maltreatment and Risk for Illicit Substance Use: Evidence for Mid-Adolescence as a Sensitive Exposure Period

Authors: Alaptagin Khan, Marty H Teicher
Year: 2025
Publication Date: 2025-10-13
Venue: medRxiv
DOI: 10.1101/2025.10.10.25337769
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Background: Childhood maltreatment is a well-established risk factor for substance misuse. However, it remains unclear whether risk for specific illicit substances is driven primarily by the cumulative burden of adverse experiences or by exposures during specific sensitive developmental periods. Methods: We administered the Maltreatment and Abuse Chronology of Exposure (MACE) scale to 2,013 medically healthy young adults (693M/1320F, ages 18-25) to assess severity of exposure to ten maltreatment types across each year of childhood. Lifetime non-medical use of opioids, stimulants, cocaine, sedatives, hallucinogens, and polysubstance involvement was determined using the Alcohol, Smoking, and Substance Involvement Screening Test (ASSIST). We employed an integrative analytical approach, combining machine learning (random forests) with modern causal inference (Targeted Maximum Likelihood Estimation) and survival modeling. Findings: Although maltreatment multiplicity (number of types) was associated with substance use in initial models, it was not a significant predictor in analyses that included specific type-time exposures. Machine learning and causal inference instead identified peer emotional abuse at age 15 in males and childhood sexual abuse at age 15 in females as potent risk factors for multiple outcomes, including opioid, stimulant, and polysubstance use. These specific exposures increased absolute risk by 20-35% (Risk Ratios: 2- to 4-fold), with associations robust to unmeasured confounding (E-values: 4-10). The apparent population-level dose-response relationship for multiplicity is therefore likely explained by the higher concentration of these critical type-time exposures among individuals with high maltreatment multiplicity. Interpretation: Our findings challenge the cumulative burden model, providing robust evidence for mid-adolescence as a sensitive period during which specific maltreatment exposures, peer victimization in males and sexual abuse in females, causally shape trajectories of illicit substance misuse. Prevention strategies targeting these adversities and clinical screening during this developmental window may be crucial for reducing the population burden of substance use disorders.

2025-10-10 — Anxiety symptoms during pregnancy and risk of adverse birth outcomes in Gondar Town Ethiopia

Authors: A. Dadi, T. A. Hassen, D. Ketema, Kedir Y. Ahmed, Z. Kassa, E. Amsalu, G. Kibret, A. A. Alemu, M. G. Bore, Animut Alebel Ayalew, J. E. Shifa, H. M. Bizuayehu
Year: 2025
Publication Date: 2025-10-10
Venue: Scientific Reports
DOI: 10.1038/s41598-025-19379-8
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Anxiety is the most common complication of pregnancy and childbirth and is reported to have a plethora of adverse maternal, birth, and childhood outcomes. There is a lack of studies that looked at the association between pregnancy-related anxiety and the risk of adverse birth outcomes in Ethiopia. We conducted a community-based prospective cohort study in Gondar Town to explore the link between anxiety symptoms during pregnancy and the risk of adverse birth outcomes (low birth weight (LBW), preterm, and stillbirth). We used the three questions listed in the Edinburgh Postnatal Depression Scale (EPDS-3 A) to assess the presence of anxiety symptoms during pregnancy. We explored both association and causal effects using modified Poisson regression and the Targeted Maximum Likelihood Estimation (TMLE), respectively. We also calculated the population impacts of risk factors significantly associated with adverse birth outcomes. Of 916 mothers who made up a cohort, 895 completed follow-ups and were included in the final analysis. No evidence of an association was found between anxiety symptoms during pregnancy and: LBW (Adjusted incidence rate ratio (aIRR) = 1.20; 95%CI: 0.49, 2.94), preterm birth (aIRR = 0.77; 95%CI: 0.36, 1.64), and stillbirth (aIRR = 3.29; 95%CI: 0.96, 11.29, p = 0.058). However, other psychosocial factors such as maternal fearful thoughts about delivery and poor stress-coping ability contributed to 34.9% (95% CI: 10.3, 52.7) and 38.3% (95% CI: 12.6, 56.5) of preterm births, respectively. Supportive husbands, on the other hand, averted about 14.7% (95% CI: 5.1, 27.1) of premature births. About 90.0% (95%CI: 82.2, 94.4) and 54.1% (95%CI: 15.0, 75.2) of the risk of LBW was attributed to preterm birth and smoking in pregnancy. There was no evidence of an association between anxiety symptoms during pregnancy and adverse birth outcomes. Other psychosocial factors contributed to or averted adverse birth outcomes. Early screening followed by providing proven psychological interventions is key for reducing individual and population-level impacts of psychosocial risk factors associated with adverse birth outcomes.

2025-10-09 — Doubly robust nonparametric efficient estimation for healthcare provider evaluation

Authors: Herbert P Susmann, Yiting Li, Mara A McAdams-DeMarco, Iván Díaz, Wenbo Wu
Year: 2025
Publication Date: 2025-10-09
Venue: Journal of the Royal Statistical Society: Series A (Statistics in Society)
DOI: 10.1093/jrsssa/qnaf145
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Provider profiling has the goal of identifying healthcare providers with exceptional patient outcomes. When evaluating providers, adjustment is necessary to control for differences in case-mix between different providers. Direct and indirect standardization are two popular risk adjustment methods. In causal terms, direct standardization examines a counterfactual in which the entire target population is treated by one provider. Indirect standardization, commonly expressed as a standardized outcome ratio, examines the counterfactual in which the population treated by a provider had instead been randomly assigned to another provider. Our first contribution is to present nonparametric efficiency bound for direct and indirectly standardized provider metrics by deriving their efficient influence functions. Our second contribution is to propose fully nonparametric estimators based on targeted minimum loss-based estimation that achieve the efficiency bounds. The finite-sample performance of the estimator is investigated through simulation studies. We apply our methods to evaluate dialysis facilities in New York State in terms of unplanned readmission rates using a large Medicare claims dataset. A software implementation of our methods is available in the R package TargetedRisk.

2025-10-08 — Super learner for survival prediction in case-cohort and generalized case-cohort studies.

Authors: Haolin Li, Haibo Zhou, David J Couper, Jianwen Cai
Year: 2025
Publication Date: 2025-10-08
Venue: Biometrics
DOI: 10.1093/biomtc/ujaf155
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2025-10-06 — Real-world cardiovascular effectiveness of sustained glucagon-like peptide 1 GLP-1 receptor agonist usage in type 2 diabetes

Authors: K. K. Sørensen, P. Yazdanfard, B. Zareini, U. Pedersen-Bjergaard, Vanja Kosjerina, M. Andersen, Anders Munch, J. S. Ohlendorff, Stefanie Schmid, S. Lanzinger, Pratik Choudhary, C. Gillies, Safoora Gharibzadeh, Marcus Lind, Viktor Tasselius, Jens Michelsen, T. Gerds, Christian Torp-Pedersen
Year: 2025
Publication Date: 2025-10-06
Venue: Cardiovascular Diabetology
DOI: 10.1186/s12933-025-02915-1
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Cardiovascular outcome trials have shown that glucagon-like peptide 1 receptor agonists (GLP1-RAs) reduce cardiovascular event rates more effectively than placebo and in patients with type 2 diabetes at increased cardiovascular risk. However, the generalizability of these findings to real-world settings remains uncertain. This study aimed to evaluate the real-world cardiovascular effectiveness of sustained GLP1-RA use compared to dipeptidyl peptidase 4 inhibitor (DPP-4i) over 3.5 years. Using Danish nationwide registries, we emulated a target trial to assess the real-world effectiveness of GLP1-RAs in a population of individuals with type 2 diabetes mirroring the inclusion and exclusion criteria from the Liraglutide Effect and Action in Diabetes: Evaluation of Cardiovascular Outcome Results (LEADER) trial. The study period was 2012–2022. Outcomes included the composite of myocardial infarction, stroke, and cardiovascular mortality (3P-MACE), as well as each component individually, alongside all-cause mortality, heart failure, angina pectoris, and revascularization. Longitudinal Targeted Minimum Loss-based Estimation, a method that adjusts for both baseline and time-varying confounding, was used to estimate absolute risks of cardiovascular outcomes under sustained use of GLP1-RA and DPP 4i (active comparator), adjusting for baseline and time-varying confounding. We included 6,681 people initiating GLP1-RA and 19,072 initiating DPP-4i. Accounting for baseline and time-varying confounding, sustained GLP1-RA use showed a 2.5% (95% CI 0.8–4.1%) risk reduction of 3P-MACEover 3.5 years. Risk reductions for cardiovascular mortality, all-cause mortality, heart failure, and unstable angina pectoris were 2.3% (95% CI 1.4–3.1%), 2.5% (95% CI 0.7–4.3%), 0.9% (95% CI 0.01–1.8%), and 0.7% (95% CI 0.01–1.3%), respectively. No significant differences were observed for myocardial infarction, stroke, or revascularization with risk differences of 0.1% (95% CI −1.0 to 0.8%), 0.8% (95% CI −0.2 to 1.7%), and 0.2% (95% CI -0.7–1.1%), respectively. This real-world study confirms the cardiovascular benefits of GLP1-RAs over DPP-4is, particularly for reducing cardiovascular and all-cause mortality under continuous treatment exposure in patients with type 2 diabetes at increased cardiovascular risk.

2025-10-01 — RWD195 ESTIMATING THE AVERAGE TREATMENT EFFECT AMONG PATIENTS WITH HEART FAILURE USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION: IMPLICATIONS FROM A LARGE-SCALE HEALTH ADMINISTRATIVE DATABASE

Authors: Yao Xu, Seok-Won Kim, Fei Zhao
Year: 2025
Publication Date: 2025-10-01
Venue: Value in Health Regional Issues
DOI: 10.1016/j.vhri.2025.101262
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2025-10-01 — 8281672 Comparison of Targeted maximum likelihood estimation and logistic regression in the analysis of retrospective cohort data

Authors: Adriano Dias, Helio Rubens Nunes, Carlos Ruíz-Frutos, Juan Gómez-Salgado, M. Alonso, João Marcos Bernardes, J. R. Lacalle-Remigio
Year: 2025
Publication Date: 2025-10-01
Venue: Wednesday Posters
DOI: 10.1136/oemed-2025-epicohabstracts.391
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2025-09-30 — Type 2 diabetes, sodium-glucose cotransporter-2 inhibitors and cardiovascular outcomes: real world evidence versus a randomised clinical trial

Authors: P. Yazdanfard, K. K. Sørensen, B. Zareini, U. Pedersen-Bjergaard, J. S. Ohlendorff, Anders Munch, M. Andersen, R. Hasselbalch, H. Imberg, Viktor Tasseleus, Marcus Lind, Jonathan Valabhji, Pratik Choudhary, K. Khunti, Stefanie Schmid, S. Lanzinger, Julia K Mader, T. Gerds, Christian Torp-Pedersen
Year: 2025
Publication Date: 2025-09-30
Venue: Cardiovascular Diabetology
DOI: 10.1186/s12933-025-02924-0
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Sodium-glucose cotransporter-2 inhibitors (SGLT2i) have demonstrated cardiovascular benefits in randomised controlled trials (RCT). However, the controlled nature of RCTs and the selected trial populations limit their generalizability to real-world practice. Substantial methodological advances now enable robust estimation of absolute risks, risk differences, and continuous on-treatment effects, providing more clinically interpretable measures of SGLT2i effectiveness than previously possible with traditional models reliant on hazard ratios. We conducted a target trial emulation using nationwide Danish registries to evaluate the real-world effectiveness of SGLT2i versus dipeptidyl peptidase-4 inhibitors (DPP4i) in individuals with type 2 diabetes (T2D) and cardiovascular disease (CVD). Outcomes included major adverse cardiovascular events (MACE), heart failure hospitalizations, and all-cause mortality. Absolute risks and risk differences for three years of continuous treatment were estimated using longitudinal targeted minimum loss-based estimation, adjusting for baseline and time-varying confounders. Among 116,823 patients who redeemed SGLT2i or DPP4i for the first time, 13,524 met inclusion and exclusion criteria (SGLT2i: 6,025; DPP4i: 7,499). At three years, the risk of MACE was 11.5% for SGLT2i users versus 14.2% for DPP4i users (risk-difference: 2.8 percentage-points, 95% CI: 1.1–4.4%). Heart failure hospitalizations were lower by 5.1 percentage-points (95% CI: 4.3–6.0%), and all-cause mortality by 3.1 percentage-points (95% CI: 1.5–4.7%), all favoring SGLT2i. Notably, we also observed a risk reduction in stroke by 2.4 percentage-points (95% CI: 1.7–3.1%). This study demonstrates the real-world effectiveness of continuous SGLT2i treatment in reducing cardiovascular events in patients with T2D and CVD. The absolute benefit of SGLT2i was larger in a real world population than in the intention to treat estimate in clinical trials.

2025-09-29 — Estimating heterogeneous impacts Of subsidised health insurance: A causal machine learning approach

Authors: Vishalie Shah, Noemi Kreif
Year: 2025
Publication Date: 2025-09-29
Venue: PLoS ONE
DOI: 10.1371/journal.pone.0315057
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
The evaluation of social and health policies often necessitates understanding the variations in impacts based on recipients’ observed characteristics, underscoring the value of estimating treatment effect heterogeneity. In this study, we leverage predictive and causal machine learning to assess the impact of the subsidised component of Indonesia’s National Health Insurance Programme (“JKN”) on healthcare utilisation in 2017. We employ causal forests for estimating heterogeneous treatment effects and the super learner algorithm for prediction tasks. Our approach addresses the prevalence of zeros in the utilisation outcomes through a two-part model, which separates the outcome model into zero and non-zero counts. This allows for distinct investigation of policy impacts on the decision to seek care and the quantity of care consumed. We interpret and summarise treatment effect heterogeneity using various approaches, including data-driven subgroup analyses and linear projections, which are grounded in theory. Our results demonstrate a positive average impact on healthcare demand with evident heterogeneity; for instance, the increase in demand varies among recipients. We also find that the effect is modified by a set of theoretically motivated covariates and those identified through our data-driven approach.

2025-09-26 — Development and validation of super learner models to predict small and large for gestational age in the second generation

Authors: Mary M Brown, Stefan Kuhle, Bruce Smith, Victoria M. Allen, Jennifer Payne, Christy G. Woolcott
Year: 2025
Publication Date: 2025-09-26
Venue: Scientific Reports
DOI: 10.1038/s41598-025-18466-0
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Prediction of small (SGA) and large for gestational age (LGA) using routinely collected antenatal data remains suboptimal, particularly among nulliparous women. In this study, models for SGA (< 10th percentile) and LGA (> 90th percentile) were developed by combining grandmaternal pregnancy-related information and maternal birth characteristics (“G0 predictors”) with maternal clinical factors available at 26 weeks’ gestation (“G1 predictors”). The study used a cohort of first-born, singleton births to nulliparous women in Nova Scotia, Canada (1981–2011), and their mothers, from the Nova Scotia Atlee Perinatal Database. Models using G0 predictors, G1 predictors, and their combination were developed with Super Learner, an ensemble machine learning algorithm, and internally validated using nested cross-validation. Discrimination was assessed via the area under the receiver operating characteristic curve (AUC-ROC) and the precision-recall curve (AUC-PR); calibration was also evaluated. Among 9,097 grandmother-mother-infant triads, 902 (9.9%) infants were SGA and 891 (9.8%) were LGA. Including G0 predictors improved discrimination compared to G1-only models (AUC-ROC 0.69 vs. 0.66 for SGA and 0.71 vs. 0.66 for LGA; AUC-PR: 0.21 vs. 0.18 for SGA and 0.22 vs. 0.18 for LGA). Models fitted using both sets of predictors were well calibrated. While incorporating intergenerational information modestly improved prediction, overall predictive performance remains poor.

2025-09-08 — “Colorectal Cancer Care Quality in a Developing Country: Insights from a Comparison of Teaching and Non-teaching Hospitals in Iran”

Authors: Mohammad Reza Rouhollahi, Mahdi Aghili, Saeed Nemati, M. Mohagheghi, Farid Azmoudeh Ardalan, H. Mahmoodzadeh, M. Mirzania, Mohammad Shirkhoda, S. H. Yahyazadeh, A. Muhammadnejad, Sepideh Abdi, M. Nikoo, K. Zendehdel
Year: 2025
Publication Date: 2025-09-08
Venue: PLoS ONE
DOI: 10.1371/journal.pone.0326796
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Background Our study represents the first effort in the Eastern Mediterranean Region to identify disparities in the quality of colorectal cancer (CRC) care in Iran. Methods We established a collaborative registry program for non-metastatic CRC patients to evaluate survival rates between teaching cancer centers (TCCs) and a high-volume, non-teaching, non-cancer center (NTNC). The study included a diverse patient population and considered various factors such as cancer stage, margin involvement, adherence to guidelines for adjuvant and neoadjuvant treatments, emergency surgeries, socioeconomic status, and risk of surgery. We utilized a multivariate Cox regression model and the targeted maximum likelihood estimator (TMLE) to analyze survival disparities in colorectal cancer between TCCs and the NTNC. Results We recruited 668 CRC patients, including 320 with colon cancer and 298 with rectal cancer. Patients who underwent surgery at teaching cancer centers (TCCs) displayed significantly higher quality of care and better outcomes than those treated at the non-teaching, non-cancer center (NTNC). The adjusted hazard ratios (HR) were 1.97 (95% CI 1.21–3.21) for colon cancer and 1.54 (95% CI 1.01–2.55) for rectal cancer. Additionally, we observed significant causal mortality risk ratios (RR) based on hospital type for overall colorectal cancer (RR = 1.42, 95% CI 1.12–1.81) and specifically for colon (RR = 1.48, 95% CI 1.04–2.11) and rectum cancer (RR = 1.39, 95% CI 1.01–2.02). Conclusion The survival disparities in colon and rectal cancers between TCCs and NTNCs highlight a significant gap in CRC care in Iran. It is essential to expand this study nationally and implement the knowledge and experiences from TCCs in other hospitals to improve the quality of care and enhance patient outcomes.

2025-09-08 — A causal inference framework to compare the effectiveness of life‐sustaining ICU therapies—using the example of cancer patients with sepsis

Authors: J. Matos, Tristan Struja, N. Woite, David Restrepo, A. K. Waschka, L. A. Celi, C. M. Sauer
Year: 2025
Publication Date: 2025-09-08
Venue: International Journal of Cancer
DOI: 10.1002/ijc.70138
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
The rise in cancer patients could lead to an increase in intensive care units (ICUs) admissions. We explored differences in treatment practices and outcomes of invasive therapies between patients with sepsis with and without cancer. Adults from 2008 to 2019 admitted to the ICU for sepsis were extracted from the databases MIMIC‐IV and eICU‐CRD. Using Extreme Gradient Boosting, we estimated the odds for invasive mechanical ventilation (IMV) or vasopressors. Targeted maximum likelihood estimation (TMLE) models estimated treatment effects of IMV and vasopressors on in‐hospital mortality and 28 hospital‐free days. 58,988 adult septic patients were included, of which 6145 had cancer. In‐hospital mortality was higher for cancer patients (30.3% vs. 16.1%). Patients with cancer had lower odds of receiving IMV (aOR [95%CI], 0.94 [0.90–0.97]); pronounced for hematologic patients (aOR 0.89 [0.84–0.93]). Odds for vasopressors were also lower for hematologic patients (aOR 0.89 [0.84–0.94]). TMLE models found IMV to be overall associated with higher in‐hospital mortality for solid and hematological patients (ATE 3% [1%–5%], 6% [3%–9%], respectively), while vasopressors were associated with higher in‐hospital mortality for patients with solid and metastatic cancer (ATE 6% [4%–8%], 3% [1%–6%], respectively). We utilized US‐wide ICU data to estimate a relationship between mortality and the use of common therapies. With the exception of hematologic patients being less likely to receive IMV, we did not find differential treatment patterns. We did not demonstrate an average survival benefit for therapies, underscoring the need for a more granular analysis to identify subgroups who benefit from these interventions.

2025-09-03 — The super learner for time-to-event outcomes: A tutorial

Authors: Ruth H. Keogh, Karla Diaz-Ordaz, N. Geloven, J. M. Gran, K. Tanner
Year: 2025
Publication Date: 2025-09-03
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Estimating risks or survival probabilities conditional on individual characteristics based on censored time-to-event data is a commonly faced task. This may be for the purpose of developing a prediction model or may be part of a wider estimation procedure, such as in causal inference. A challenge is that it is impossible to know at the outset which of a set of candidate models will provide the best predictions. The super learner is a powerful approach for finding the best model or combination of models ('ensemble') among a pre-specified set of candidate models or'learners', which can include parametric and machine learning models. Super learners for time-to-event outcomes have been developed, but the literature is technical and a reader may find it challenging to gather together the full details of how these methods work and can be implemented. In this paper we provide a practical tutorial on super learner methods for time-to-event outcomes. An overview of the general steps involved in the super learner is given, followed by details of three specific implementations for time-to-event outcomes. We cover discrete-time and continuous-time versions of the super learner, as described by Polley and van der Laan (2011), Westling et al. (2023) and Munch and Gerds (2024). We compare the properties of the methods and provide information on how they can be implemented in R. The methods are illustrated using an open access data set and R code is provided.

2025-09-01 — Finding the Optimal Number of Splits and Repetitions in Double Cross‐Fitting Targeted Maximum Likelihood Estimators

Authors: Mohammad Ehsanul Karim, Momenul Haque Mondol
Year: 2025
Publication Date: 2025-09-01
Venue: Pharmaceutical statistics
DOI: 10.1002/pst.70022
Link: Semantic Scholar
Matched Keywords: super learner, tmle

Abstract:
Flexible machine learning algorithms are increasingly utilized in real‐world data analyses. When integrated within double robust methods, such as the Targeted Maximum Likelihood Estimator (TMLE), complex estimators can result in significant undercoverage—an issue that is even more pronounced in singly robust methods. The Double Cross‐Fitting (DCF) procedure complements these methods by enabling the use of diverse machine learning estimators, yet optimal guidelines for the number of data splits and repetitions remain unclear. This study aims to explore the effects of varying the number of splits and repetitions in DCF on TMLE estimators through statistical simulations and a data analysis. We discuss two generalizations of DCF beyond the conventional three splits and apply a range of splits to fit the TMLE estimator, incorporating a super learner without transforming covariates. The statistical properties of these configurations are compared across two sample sizes (3000 and 5000) and two DCF generalizations (equal splits and full data use). Additionally, we conduct a real‐world analysis using data from the National Health and Nutrition Examination Survey (NHANES) 2017–18 cycle to illustrate the practical implications of varying DCF splits, focusing on the association between obesity and the risk of developing diabetes. Our simulation study reveals that five splits in DCF yield satisfactory bias, variance, and coverage across scenarios. In the real‐world application, the DCF TMLE method showed consistent risk difference estimates over a range of splits, though standard errors increased with more splits in one generalization, suggesting potential drawbacks to excessive splitting. This research underscores the importance of judicious selection of the number of splits and repetitions in DCF TMLE methods to achieve a balance between computational efficiency and accurate statistical inference. Optimal performance seems attainable with three to five splits. Among the generalizations considered, using full data for nuisance estimation offered more consistent variance estimation and is preferable for applied use. Additionally, increasing the repetitions beyond 25 did not enhance performance, providing crucial guidance for researchers employing complex machine learning algorithms in causal studies and advocating for cautious split management in DCF procedures.

2025-08-27 — Causal estimation of time-varying treatments in observational studies: a scoping review of methods, applications, and missing data practices

Authors: Mercy Rop, I. Maposa, Taryn Young, R. Machekano
Year: 2025
Publication Date: 2025-08-27
Venue: BMC Medical Research Methodology
DOI: 10.1186/s12874-025-02633-y
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Estimating causal effects of time-varying treatments or exposures in observational studies is challenging due to time-dependent confounding and missing data, necessitating advanced statistical approaches for accurate inference. Previous reviews indicate that singly robust methods are prevalent in epidemiological studies despite the availability of more robust alternatives that better handle time-varying confounding. Although common in longitudinal studies, missing data are often inadequately reported and addressed, potentially compromising the validity of estimates. Whether this dependence on less robust methods and inadequate handling of missing data persists in time-varying treatment settings remains unclear. This review aimed to identify current practices, methodological trends, and gaps in the causal estimation of time-varying treatments. We conducted a scoping review to map causal methodologies for time-varying treatments in epidemiological studies and identify trends and gaps. To capture the most recent developments, we searched PubMed, Scopus, and Web of Science for articles published between 2023 and 2024. A structured questionnaire was used to extract key methodological aspects, and findings were summarized using descriptive statistics. Of the 424 articles, 63 met the eligibility criteria, with five added from citations and references, totalling 68 for analysis. Among these, 78% addressed epidemiological questions, 13% included methodological illustrations, and 9% focused solely on methods. Singly robust methods dominated, with inverse probability of treatment weighting (IPTW) being the most common (64.3%), followed by targeted maximum likelihood estimation (TMLE) (14.3%). The emergence of new estimation approaches was also noted. Missing data handling remained inadequate; 33% did not report the extent of missingness, 95.2% lacked assumptions, and sensitivity analysis was performed in only 14.5% of the articles. Multiple imputation (MI) was more prevalent (29%), while complete case analysis (11.3%) was likely underreported, given 33.9% omitted strategy details. Persistent reliance on singly robust methods, underutilization of doubly robust approaches, and inadequate missing data handling highlight ongoing gaps in evaluating time-varying treatments. While newer estimation approaches are emerging, their adoption remains limited. These trends, alongside the growing complexity of real-world data and the demand for evidence-driven care, call for greater methodological rigor, wider adoption of robust approaches, and enhanced reporting transparency.

2025-08-22 — Ensemble Learning-Based Mental Analysis and Prediction of Mental Disorder

Authors: L.Sudha, P. Sreedevi, Dhomala Aswani, Dr. K. V. Panduranga Rao, Dr. C. Dastagiraiah, Dr. K Pushpa
Year: 2025
Publication Date: 2025-08-22
Venue: 2025 International Conference on Sustainability, Innovation & Technology (ICSIT)
DOI: 10.1109/ICSIT65336.2025.11293926
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Given the increasing prevalence of mental health disorders, the use of evidence-based prediction tools to support early intervention is paramount, which can reduce stigma through sustainability. Because of the importance of using the Super Learner algorithm to improve prediction accuracy, this chapter presents on a new framework for predicting mental disorders using ensemble machine learning methods. The new framework collects survey responses from a participant across varying levels that each have a different set of questions when they participate in a study over multiple sessions. The algorithm analyzes the data and predicts an individual's mental health status on the survey data from only a few levels, for the purpose of providing targeted intervention recommendations. We applied the Super Learner method to combine different machine learning Models, and the model uses a large data set to train and optimize the prediction system. The modified survey used for assessment increases accuracy adapted to the individual's questionnaire by asking about multiple sessions. The initial results provide an indication that there is a high level of accuracy predicting mental state, and the method has the potential for identifying early intervention needed and reducing limitations related to stigma for mental health.

2025-08-18 — TMLE.jl: Targeted Minimum Loss-Based Estimation in Julia

Authors: Olivier Labayle, C. P. Ponting, Mark J. van der Laan, A. Khamseh, S. Beentjes
Year: 2025
Publication Date: 2025-08-18
Venue: Journal of Open Source Software
DOI: 10.21105/joss.08446
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation, tmle

Abstract:
Abstract unavailable from Semantic Scholar.

2025-08-17 — Uncovering the Nexus between ESG Reports and ESG Scores across Various Liquidity Levels: Evidence from Publicly Traded Turkish Companies by Machine Learning Algorithms

Authors: Serpil Kılıç Depren, Dilvin Taşkın, Talat Ulussever
Year: 2025
Publication Date: 2025-08-17
Venue: Journal of Sustainable Economies
DOI: 10.51300/jse-2025-154
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Considering the recent restructuration of environmental, social, and governance (ESG) reports in Türkiye, this study uncovers effectiveness of ESG reports in ESG score estimation across diverse liquidity levels. Accordingly, the study examines four different samples as the full sample, Borsa Istanbul 100 (XU100) index, Borsa Istanbul 50 (XU050) index, and Borsa Istanbul 30 (XU030) index, where 102, 60, 43, and 26 companies exist, respectively. The study considers restructured ESG reports for 2022 and 2023 and performs five different machine learning (ML) algorithms. The findings demonstrate that (i) among all segments, environment segment includes principles that have the highest importance, while social, common, and governance segments follow, respectively; (iii) absolute and relative variable importance of ESG principles differentiate; (iii) super learner (SL) is the best ML algorithm, where its estimative power (R2) is around 95\% for the best estimation. Thus, the results demonstrate that the estimative power of restructured ESG reports in the estimation of ESG scores is quite high. Hence, the study highlights a varying contribution of ESG segments and principles to the ESG scores of the companies and reveals a nonlinear need by companies to focus on highly important ESG principles so that companies can stimulate their ESG scores.

2025-08-14 — A comparison of causal inference methods for evaluating multiple treatment groups

Authors: Shuai Chen, Hao Wu, Hongwei Zhao
Year: 2025
Publication Date: 2025-08-14
Venue: Journal of nonparametric statistics (Print)
DOI: 10.1080/10485252.2025.2544936
Link: Semantic Scholar
Matched Keywords: super learner, tmle

Abstract:
Causal inference is formulated using the counterfactual framework, enabling direct investigation of causal questions. Causal inference methods can incorporate machine learning techniques into the estimation process, allowing for more flexible models. However, the integration of machine learning methods adds complexity to statistical inference. In this paper, we systematically assess several methods for making causal inference with multiple treatment groups, including the outcome regression, inverse propensity score weighting, double-robust estimators, and their counterparts when employing a super learner in the estimation process, as well as the targeted maximum likelihood estimator (TMLE). We conduct numerical studies with complex data-generating models to evaluate these different estimators. Our results suggest that the double-robust estimator, when combined with machine learning, is the most favourable approach, demonstrating lower biases, a valid variance estimator, and improved coverage probabilities for the 95% confidence interval.

2025-08-13 — Linking GFAP Levels to Speech Anomalies in Acute Brain Injury: A Simulation Based Study

Authors: Shamaley Aravinthan, Bin Hu
Year: 2025
Publication Date: 2025-08-13
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Background: Glial fibrillary acidic protein (GFAP) is a biomarker for intracerebral hemorrhage and traumatic brain injury, but its link to acute speech disruption is untested. Speech anomalies often emerge early after injury, enabling rapid triage. Methods: We simulated a cohort of 200 virtual patients stratified by lesion location, onset time, and severity. GFAP kinetics followed published trajectories; speech anomalies were generated from lesion-specific neurophysiological mappings. Ensemble machine-learning models used GFAP, speech, and lesion features; robustness was tested under noise, delays, and label dropout. Causal inference (inverse probability of treatment weighting and targeted maximum likelihood estimation) estimated directional associations between GFAP elevation and speech severity. Findings: GFAP correlated with simulated speech anomaly severity (Spearman rho = 0.48), strongest for cortical lesions (rho = 0.55). Voice anomalies preceded detectable GFAP rise by a median of 42 minutes in cortical injury. Classifier area under the curve values were 0.74 (GFAP only), 0.78 (voice only), and 0.86 for the fused multimodal model, which showed higher sensitivity in mild or ambiguous cases. Causal estimates indicated higher GFAP increased the modeled probability of moderate-to-severe speech anomalies by 32 to 35 percent, independent of lesion site and onset time. Conclusion: These results support a link between GFAP elevation and speech anomalies in acute brain injury and suggest integrated biochemical-voice diagnostics could improve early triage, especially for cortical injury. Findings are simulation-based and require validation in prospective clinical studies with synchronized GFAP assays and speech recordings.

2025-08-13 — Developing an Inhaled NEU1 Inhibitor for Cystic Fibrosis via Pharmacokinetic and Biophysical Modeling

Authors: Yousra Hassan Alsaad Almeshale, Abdulelah Hassan Almeshali, Omar Alsaddique, Noura Jandali, Nadeen Garaween, Bin Hu
Year: 2025
Publication Date: 2025-08-13
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Background: Cystic fibrosis (CF) airway mucus exhibits reduced mucin sialylation, increasing viscosity and impairing mucociliary clearance (MCC). NEU1 inhibition has been proposed to restore MCC, but its quantitative pharmacokinetic and rheological effects, particularly with inhaled delivery, remain uncharacterized. Objective: To develop an integrated pharmacokinetic/pharmacodynamic (PK/PD) and biophysical model to assess the efficacy of an inhaled NEU1 inhibitor. Methods: Empirical and preclinical NEU1 inhibition data were combined with inhalation PK/PD modeling and a biophysical viscosity framework linking mucin sialylation and extracellular DNA. Synthetic cohort simulations (N = 200) were reconciled with empirical PK benchmarks using Latin hypercube parameter sampling. Cross-validation, hold-out testing, and causal inference methods (inverse probability of treatment weighting and targeted maximum likelihood estimation) quantified predicted effects on lung function (delta FEV1). Results: With reconciled parameters (F_dep = 0.12; k_abs = 0.21 per hour; k_muc = 0.24 per hour), epithelial lining fluid drug levels reached a peak concentration of 7.5 micromolar (95 percent CI: 6 to 10 micromolar), achieving IC50 coverage for approximately 10 hours per day and greater than 80 percent modeled NEU1 inhibition. Predicted mucus viscosity reduction averaged 25 to 28 percent. Causal inference estimated delta FEV1 improvement of +0.13 liters (95 percent CI: 0.10 to 0.15 liters), with about 70 percent mediated via MCC. Conclusions: Empirically anchored PK/PD and biophysical modeling support the feasibility of inhaled NEU1 inhibition as a rheology-targeting strategy in CF, projecting clinically realistic efficacy while maintaining pharmacological viability. This calibrated proof of concept warrants in vivo validation in CF models.

2025-08-13 — Determining the Survival Impact and Cost-Effectiveness of Multi-Gene Panel Sequencing in Metastatic Colorectal Cancer With Super Learning Approaches.

Authors: Emanuel Krebs, Deirdre Weymann, Howard J. Lim, Stephen Yip, D.A. Regier
Year: 2025
Publication Date: 2025-08-13
Venue: Health Services Research
DOI: 10.1111/1475-6773.70009
Link: Semantic Scholar
Matched Keywords: super learning, tmle

Abstract:
OBJECTIVE To determine the effectiveness and cost-effectiveness of multi-gene panel sequencing compared to single-gene KRAS testing for metastatic colorectal cancer (mCRC). STUDY SETTING AND DESIGN British Columbia, Canada (BC) is a provincial single-payer public healthcare system, and it was the first province to publicly reimburse multi-gene sequencing for mCRC. Panels expand treatment de-escalation by expanding RAS testing for more precise targeting of anti-EGFR therapies. Reimbursement of panels remains unequal across healthcare systems given uncertain clinical and economic impacts. Our quasi-experimental study design followed the target trial emulation approach, emulating random treatment assignment with two different methods to examine the sensitivity of estimates: inverse probability of treatment weighting estimated with super learning (SL-IPTW) and 1:1 genetic algorithm-based matching, a machine learning approach. We then estimated mean three-year survival time and costs (public healthcare payer perspective; 2021CAD) and calculated the incremental net monetary benefit (INMB) for life-years gained (LYG) at $50,000/LYG using weighted linear regression and nonparametric bootstrapping, also accounting for inverse probability of censoring weights. Our sensitivity analysis estimated LYG using targeted minimum-based loss estimation (TMLE), a doubly robust approach that also uses super learning. DATA SOURCES AND ANALYTICAL SAMPLE Patient-level linked administrative health databases capturing cancer and non-cancer care for all BC adults with a metastatic colorectal cancer between 2016 and 2019. PRINCIPAL FINDINGS Our study included 892 patients (84.3%) receiving multi-gene panels and 166 (15.7%) receiving single-gene testing. INMB estimates were similar for SL-IPTW ($20,397; 95% CI: $9317, $34,862) and matching ($19,569; 95% CI: $8509, $31,447), with 99.3% and 98.8% probabilities, respectively, of panels being cost-effective. We found statistically significant survival benefits with LYG of 0.31 (SL-IPTW; 95% CI: 0.04, 0.54), 0.25 (matching; 95% CI: 0.03, 0.47) and 0.19 (TMLE; 95% CI: 0.02, 0.37). CONCLUSIONS Survival impacts were robust to super learning approaches. Real-world evidence demonstrates that reimbursing multi-gene sequencing for more precise targeting of mCRC treatments provides value for healthcare systems and clinically important benefits to patients.

2025-08-12 — Efficient Statistical Estimation for Sequential Adaptive Experiments with Implications for Adaptive Designs

Authors: Wenxin Zhang, M. V. D. Laan
Year: 2025
Publication Date: 2025-08-12
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Adaptive experimental designs have gained popularity in clinical trials and online experiments. Unlike traditional, fixed experimental designs, adaptive designs can dynamically adjust treatment randomization probabilities and other design features in response to data accumulated sequentially during the experiment. These adaptations are useful to achieve diverse objectives, including reducing uncertainty in the estimation of causal estimands or increasing participants'chances of receiving better treatments during the experiment. At the end of the experiment, it is often desirable to answer causal questions from the observed data. However, the adaptive nature of such experiments and the resulting dependence among observations pose significant challenges to providing valid statistical inference and efficient estimation of causal estimands. Building upon the Targeted Maximum Likelihood Estimator (TMLE) framework tailored for adaptive designs (van der Laan, 2008), we introduce a new Adaptive-Design-Likelihood-based TMLE (ADL-TMLE) to estimate a wide class of causal estimands from adaptive experiment data, including the average treatment effect as our primary example. We establish asymptotic normality and semiparametric efficiency of ADL-TMLE under relaxed positivity and design stabilization assumptions for adaptive experiments. Motivated by these results, we further propose a novel adaptive design aimed at minimizing the variance of the estimator based on data generated under that design. Simulations show that ADL-TMLE demonstrates superior variance-reduction performance across different types of adaptive experiments, and that the proposed adaptive design attains lower variance than the standard efficiency-oriented adaptive design. Finally, we generalize our framework to broader settings, including those with longitudinal structures.

2025-08-01 — Minor impact of anastomotic leakage on long-term quality of life after anterior resection: a population-based cohort study

Authors: Marcus Lindsköld, Anders Gerdin, Jennifer Park, Jenny Häggström, M. Lydrup, P. Matthiessen, H. Jutesten, Sofia J. Sandberg, E. Angenete, Petrus Vinnars, P. Buchwald, M. Rutegård
Year: 2025
Publication Date: 2025-08-01
Venue: British Journal of Surgery
DOI: 10.1093/bjs/znaf149.057
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Anastomotic leakage (AL) following anterior resection (AR) for rectal cancer impacts morbidity and bowel dysfunction. The aim of this study was to evaluate the impact of leakage on health-related quality of life (HRQoL) three years after surgery, as the evidence regarding long-term effects is limited. This population-based observational study included patients who underwent AR in Sweden between 2015 and 2017, retrieved from the Swedish Colorectal Cancer Registry. The main outcome measure was the summary score of EORTC QLQ–C30, a validated questionnaire developed to assess the quality of life of cancer patients, three years after surgery; secondary outcomes were from the colorectal-cancer specific EORTC QLQ–CR29. Targeted maximum likelihood estimation was used to assess the influence of leakage on HRQoL, accounting for confounders. Of 1,778 eligible patients, 1,178 (66.3%) responded, including 104 (8.8%) with AL. Patients with leakage reported a significantly lower summary score, considered a small difference (80 vs. 86, p < 0.001); this effect was estimated at -4 (p = 0.002) after adjustment. These patients also had worse body image, more sore skin around the anus and more leakage of stool from the stoma bag (if present), considered medium differences. AL following AR seems to have a minor negative impact on HRQoL three years postoperatively, while these patients have a worse reported body image, sore skin around the anus and leakage of stool from the stoma bag. These findings highlight the importance of patient counselling and long-term follow-up for patients with AL.

2025-07-31 — Maximizing oil recovery in sandstone reservoirs through optimized ASP injection using the super learner algorithm

Authors: D. F. Putra, Mohd Zaidi Jaafar, Ku Muhd Na’im Khalif, Apri Siswanto, I. Lukman, A. Kurniawan
Year: 2025
Publication Date: 2025-07-31
Venue: Communications in Science and Technology
DOI: 10.21924/cst.10.1.2025.1649
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Optimizing the Alkaline-Surfactant-Polymer (ASP) injection process remains a persistent challenge in Enhanced Oil Recovery (EOR), particularly in heterogeneous sandstone reservoirs where traditional reservoir simulators are constrained by high computational demands and limited flexibility. This study introduces a novel application of the Super Learner (SL) ensemble, a stacking-based machine learning algorithm integrating multiple base models (XGBoost, SVR, BRR, and Decision Tree), to systematically predict and optimize ASP injection parameters. Unlike previous approaches, our method blends high-fidelity CMOST simulation data with machine learning precision in which it enables real-time optimization with field-scale relevance. Using 500 simulation scenarios validated by laboratory input, the SL model achieved exceptional predictive performance (R² = 0.988, RMSE = 0.304), outperforming all individual learners. The optimal recovery factor (RF) of 79.49% was obtained with the finely tuned concentrations of surfactant (5483.29 ppm), polymer (2242.61 ppm), SO?²? (5610.15 ppm), CO?²? (7053.59 ppm), and Na? (9939.35 ppm). Remarkably, the SL approach could reduce optimization time from 10 hours (CMOST) to under 1 minute; this underscored its potential for real-time operational deployment. The novelty of this work lies in its integrated use of ensemble learning to capture the complex and non-linear interactions between ionic chemistry and oil mobilization behavior, offering a field-ready AI framework for rapid and adaptive EOR design. This approach paves the way for the intelligent optimization of ASP schemes by minimizing the reliance on computationally intensive simulations while ensuring chemical and economic efficiency in marginal or complex reservoirs.

2025-07-31 — "Within-trial"prognostic score adjustment is targeted maximum likelihood estimation

Authors: Emilie Højbjerre-Frandsen, Alejandro Schuler
Year: 2025
Publication Date: 2025-07-31
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Adjustment for ``super''or ``prognostic''composite covariates has become more popular in randomized trials recently. These prognostic covariates are often constructed from historical data by fitting a predictive model of the outcome on the raw covariates. A natural question that we have been asked by applied researchers is whether this can be done without the historical data: can the prognostic covariate be constructed or derived from the trial data itself, possibly using different folds of the data, before adjusting for it? Here we clarify that such ``within-trial''prognostic adjustment is nothing more than a form of targeted maximum likelihood estimation (TMLE), a well-studied procedure for optimal inference. We demonstrate the equivalence with a simulation study and discuss the pros and cons of within-trial prognostic adjustment (standard efficient estimation) relative to standard TMLE and standard prognostic adjustment with historical data.

2025-07-24 — Causal inference study of PRRSV-MLV vaccine dosing effects on wean-to-finish performance during outbreaks

Authors: Swaminathan Jayaraman, Tyler Bauman, A. Maschhoff, Caleb Shull, Peng Li, Edison S Magalhães, G. Trevisan, Daniel C. L. Linhares, Chunlin Li, Gustavo S. Silva
Year: 2025
Publication Date: 2025-07-24
Venue: Frontiers in Veterinary Science
DOI: 10.3389/fvets.2025.1575029
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Porcine Reproductive and Respiratory Syndrome Virus (PRRSV) greatly impacts swine production, and vaccination is the main method for reducing its economic effects on grow-finish populations. To cut costs, some producers use half-doses of modified live virus (MLV) vaccines, but the effectiveness of this approach during disease outbreaks is not well understood. This retrospective observational study used causal inference techniques to assess the impact of full-dose versus half-dose PRRSV-MLV vaccination on mortality and other key production outcomes in growing pigs experiencing PRRSV-2 outbreaks. Data analysis included 158 pig groups (47 nurseries, 111 finishing) from the Midwest United States that experienced PCR-confirmed PRRSV-2 outbreaks between 2021 and 2022, predominantly with L1C and L1A lineages. Mortality was established as the primary outcome, with cull rates, average daily gain, veterinary medicine costs, and percentage of grade A pigs at market as secondary outcomes. Using targeted maximum likelihood estimation (TMLE), a doubly robust causal inference technique, the study estimated the causal effects of vaccination dosage while accounting for potential confounders, including season, year, vaccine type, timing of vaccination, nursery stocking density, and presence of concurrent diseases. The analysis revealed distinct phase-specific effects: in the nursery, full-dose vaccination was associated with higher mortality difference (8.84, 95% CI: 4.7, 12.98) and increased veterinary costs (1.52 dollars/pig, 95% CI: 1.13, 1.91). However, in the finishing phase, full-dose vaccination significantly reduced the mortality difference (−3.40, 95% CI: −4.66, −2.29) despite slightly higher veterinary costs (0.47 dollars/pig, 95% CI: 0.03, 0.9). No significant differences between dosing strategies were observed in average daily gain, cull rates, or percentage of grade A pigs at the market. These findings suggest that while nursery groups vaccinated with full-dose had higher mortality and costs, it provided protective benefits during the economically critical finishing phase. For swine producers and veterinarians, these results indicate that the economic advantage of half-dose vaccination strategies should be carefully weighed against the increased mortality, particularly in systems with recurring PRRSV challenges. This study demonstrates the value of causal inference methods in analyzing real-world vaccination outcomes and provides evidence-based guidance for optimizing PRRSV vaccination protocols in commercial swine production.

2025-07-23 — Super Learner Enhances Postoperative Complication Prediction in Colorectal Surgery.

Authors: T. Violante, D. Ferrari, M. Novelli, W. Perry, Kellie L Mathis, E. Dozois, D. W. Larson
Year: 2025
Publication Date: 2025-07-23
Venue: Annals of Surgery
DOI: 10.1097/SLA.0000000000006847
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
OBJECTIVE To determine if a Super Learner (SL) machine learning approach could improve the predictive accuracy of the American College of Surgeons Risk Calculator (ACS-RC) for postoperative complications in patients undergoing colorectal surgery. SUMMARY OF BACKGROUND DATA Machine learning (ML) has shown significant potential to advance medical fields, including surgical risk prediction. Current tools, like the ACS-RC which uses logistic regression and extreme gradient boosting, are standard but may be enhanced by more advanced ML ensembles. METHODS This retrospective study analyzed colorectal surgery cases from the 2018-2022 ACS National Surgical Quality Improvement Program (NSQIP) database. An SL model, which combines multiple ML algorithms, was developed to predict fourteen postoperative outcomes. Its performance was compared against traditional logistic regression (LOG) and extreme gradient boosting (XGB) models. Key performance metrics included discrimination (AUROC, AUPRC) and calibration (Brier score, Hosmer-Lemeshow test). RESULTS The SL model demonstrated superior performance across all predicted complications when compared to both LOG and XGB. It showed superior discrimination for severe outcomes, achieving an AUROC greater than 0.94 for predicting mortality. The SL model was also more accurate in predicting infectious complications and length of stay, and its calibration metrics indicated a better overall fit and accuracy. CONCLUSIONS The Super Learner model enhances the accuracy of postoperative risk prediction in colorectal surgery. Its superior performance suggests it is a promising tool for improving personalized patient counseling, aiding clinical decision-making, and optimizing resource allocation.

2025-07-23 — Statistical Methods to Adjust for Treatment Switching in Real‐World Clinical Studies: A Scoping Review and Descriptive Comparison

Authors: Romain Jonathan Collet, Â. Ben, Anita Natalia Varga, F. van Leth, M. El Alili, Jonas Esser, J. Bosmans, J. V. van Dongen
Year: 2025
Publication Date: 2025-07-23
Venue: Clinical pharmacology and therapy
DOI: 10.1002/cpt.70013
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Real‐world data from sources, such as patient registries and electronic health records, can complement randomized controlled trials by providing timely, generalizable insights that better reflect routine clinical practice. However, the absence of randomization can introduce bias, particularly when treatment switching—defined as deviation from or discontinuation of the initial treatment—is influenced by time‐varying confounders, that is, variables that are associated with both treatment decisions and outcomes over time. This study presents a comprehensive overview of statistical methods used to adjust for treatment switching in real‐world studies to improve causal inference. We systematically searched MEDLINE and Embase for studies comparing at least two statistical methods for adjusting for treatment switching, from inception to December 2024. Forty‐five studies were included, identifying four main categories of methods: (1) traditional approaches (intention‐to‐treat, per‐protocol, as‐treated, repeated measures); (2) propensity score‐based methods (adjustment, matching, marginal structural models); (3) g‐methods other than marginal structural models (g‐computation, structural nested models, longitudinal targeted maximum likelihood estimation); (4) methods addressing unmeasured confounding (regression calibration, instrumental variables). Traditional methods are straightforward, but often yield biased estimates in the presence of treatment switching. Advanced methods, such as g‐methods, are designed to adjust for time‐varying confounding and can produce less biased estimates, though they require complex modeling. Instrumental variables and regression calibration relax the assumption of no unmeasured confounding, but rely on strong, often untestable conditions. By evaluating each method’s assumptions, strengths, and limitations, we support applied researchers in selecting appropriate methods to strengthen causal inference in real‐world studies.

2025-07-22 — Diabetes is causally associated with increased breast cancer mortality by inducing FIBCD1 to activate MCM5-mediated cell cycle arrest via modulating H3K27ac

Authors: Binbin Tan, Yang Liu, Qianqian Chen, Weijie Yang, Wenhan Yang, Kaiping Gao, Li Fu, Tiantian Zhang, Penglong Chen, Yongyi Huang, Yuting Wang, Guoqiang Zhang, Juan Xiong, Rihong Zhai
Year: 2025
Publication Date: 2025-07-22
Venue: Cell Death and Disease
DOI: 10.1038/s41419-025-07849-w
Link: Semantic Scholar
Matched Keywords: super learner, targeted maximum likelihood estimation, tmle

Abstract:
Breast cancer (BC) is the most common tumor worldwide and it has been recognized that up to one third of BC patients have co-existing diabetes mellitus (DM) (BC-DM). Although many observational studies have indicated an association between DM and BC, the causal relationship of DM and BC prognosis remained uncertain and the molecular mechanisms underlying BC-DM are largely unclear. In this study, we used causal inference methods, including g-computation (GC), inverse probability of treatment weighting (IPTW), targeted maximum likelihood estimation (TMLE), and TMLE-super learner (TMLE-SL), to comprehensively analyze the association of DM with BC mortality in a cohort of 3386 BC patients. We found that the adjusted odds ratios (OR) and 95% confidence intervals (95% CI) for 5-year mortality in BC-DM patients were 1.926 (1.082, 2.943), 2.268 (1.063, 3.974), 1.917 (1.091, 2.953), and 2.113 (1.365, 3.270), respectively. Further transcriptomic and qPCR analyses identified that FIBCD1 was highly expressed in BC-DM tumor tissues and in BC cells under hyperglycemia conditions. Functionally, upregulation of FIBCD1 promoted proliferation, migration, and invasion capacities of BC cells in a glucose level-dependent manner. While knockdown of FIBCD1 suppressed BC tumor growth in diabetic mice. Integrated RNA-seq and Ribo-seq analysis revealed that MCM5 was a target of FIBCD1. Mechanistically, hyperglycemia-activated FIBCD1 promoted MCM5 expression to induce S-phase cell cycle arrest by upregulating histone H3K27ac levels in MCM5 promoter via the PDH-acetyl-CoA axis. Our findings provide new evidence that co-existing DM has a causal effect on overall mortality in BC-DM patients. Targeting FIBCD1 may be a promising therapy for BC-DM.

2025-07-16 — Targeted Deep Architectures: A TMLE-Based Framework for Robust Causal Inference in Neural Networks

Authors: Yi Li, David Mccoy, Nolan Gunter, Kaitlyn J. Lee, Alejandro Schuler, M. V. D. Laan
Year: 2025
Publication Date: 2025-07-16
Venue: arXiv.org
DOI: 10.48550/arXiv.2507.12435
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Modern deep neural networks are powerful predictive tools yet often lack valid inference for causal parameters, such as treatment effects or entire survival curves. While frameworks like Double Machine Learning (DML) and Targeted Maximum Likelihood Estimation (TMLE) can debias machine-learning fits, existing neural implementations either rely on"targeted losses"that do not guarantee solving the efficient influence function equation or computationally expensive post-hoc"fluctuations"for multi-parameter settings. We propose Targeted Deep Architectures (TDA), a new framework that embeds TMLE directly into the network's parameter space with no restrictions on the backbone architecture. Specifically, TDA partitions model parameters - freezing all but a small"targeting"subset - and iteratively updates them along a targeting gradient, derived from projecting the influence functions onto the span of the gradients of the loss with respect to weights. This procedure yields plug-in estimates that remove first-order bias and produce asymptotically valid confidence intervals. Crucially, TDA easily extends to multi-dimensional causal estimands (e.g., entire survival curves) by merging separate targeting gradients into a single universal targeting update. Theoretically, TDA inherits classical TMLE properties, including double robustness and semiparametric efficiency. Empirically, on the benchmark IHDP dataset (average treatment effects) and simulated survival data with informative censoring, TDA reduces bias and improves coverage relative to both standard neural-network estimators and prior post-hoc approaches. In doing so, TDA establishes a direct, scalable pathway toward rigorous causal inference within modern deep architectures for complex multi-parameter targets.

2025-07-15 — The effects of long-term ambient air pollutant mixture exposure on incident diabetes: A prospective cohort study in China.

Authors: Aibin Qu, F. Wen, Bingxiao Li, Pandi Li, Bowen Zhang, Xiaojun Yang, Xinyue Yao, Boya Li, X. Lao, Ling Zhang
Year: 2025
Publication Date: 2025-07-15
Venue: Ecotoxicology and Environmental Safety
DOI: 10.1016/j.ecoenv.2025.118652
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
BACKGROUND Although an increasing number of studies have shown air pollution exposure is associated with diabetes, the potential causal effects of air pollutants on incident diabetes and the joint effects of air pollutant mixtures remain unclear. METHODS We conducted a prospective cohort study that included 25,801 adults based on Chronic Disease of the Community Natural Population in the Beijing-Tianjin-Hebei region. Three-year mean concentrations of air pollutants (PM2.5, PM10, PM1, and NO2) and PM2.5 components (ammonium [NH4+], nitrate [NO3-], sulfate [SO42-], and chloride ion [Cl-]) were obtained from China High Air Pollutants database. Targeted maximum likelihood estimation was used to estimate potential causal relationships between long-term air pollution exposure and diabetes incidence. The joint effects of air pollutant mixtures on diabetes and the contribution of each pollutant were assessed using Quantile G-computation. RESULTS In single-pollutant models, moderate and high concentrations of PM2.5, PM10, PM1, NO2, NH4+, NO3-, SO42-, and Cl- exposure were significantly associated with diabetes risk compared with low concentrations of air pollutants. In multi-pollutant models, the joint effect of air pollutant mixture (PM2.5, PM10, PM1, and NO2) on diabetes was 1.006 (1.004, 1.009). After replacing PM2.5 with PM2.5 components in the mixture, the effect estimates remained robust at 1.015 (1.008, 1.021), and the positive effect was driven primarily by NH4+ at 43.66 %, followed by NO3- at 39.20 %. CONCLUSIONS Our results revealed relationships between long-term air pollutant exposure and incident diabetes. Furthermore, NH4+ and NO3- might be strong contributors. These findings support targeted air quality interventions to reduce diabetes risk.

2025-07-15 — Medical Cannabis Use and Healthcare Utilization Among Patients with Chronic Pain: A Causal Inference Analysis Using TMLE

Authors: Mitchell L. Doucette, Emily Fisher, J. Chin, Panagiota Kitsantas
Year: 2025
Publication Date: 2025-07-15
Venue: Pharmacy
DOI: 10.3390/pharmacy13040096
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Introduction: Chronic pain affects approximately 20% of U.S. adults, imposing significant burdens on individuals and healthcare systems. Medical cannabis has emerged as a potential therapy, yet its impact on healthcare utilization remains unclear. Methods: This retrospective cohort study analyzed administrative data from a telehealth platform providing medical cannabis certifications across 36 U.S. states. Patients were classified as cannabis-exposed if they had used cannabis in the past year, while unexposed patients had no prior cannabis use. Outcomes included self-reported urgent care visits, emergency department (ED) visits, hospitalizations, and quality of life (QoL), measured using the CDC’s Healthy Days measure. Targeted Maximum Likelihood Estimation with SuperLearner estimated causal effects, adjusting for numerous covariates. Results: Medical cannabis users exhibited significantly lower healthcare utilization. Specifically, exposure was associated with a 2.0 percentage point reduction in urgent care visits (95% CI: −0.036, −0.004), a 3.2 percentage point reduction in ED visits (95% CI: −0.051, −0.012) and fewer unhealthy days per month (−3.52 days, 95% CI: −4.28, −2.76). Hospitalization rates trended lower but were not statistically significant. Covariate balance and propensity score overlap indicated well-fitting models. Conclusions: Medical cannabis use was associated with reduced healthcare utilization and improved self-reported QoL among chronic pain patients.

2025-07-15 — Constructing targeted minimum loss/maximum likelihood estimators: a simple illustration to build intuition.

Authors: Rachael K. Ross, Lina M. Montoya, Dana E. Goin, Ivan Diaz, Audrey Renson
Year: 2025
Publication Date: 2025-07-15
Venue: American Journal of Epidemiology
DOI: 10.1093/aje/kwaf261
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Machine learning is increasingly used to estimate nuisance functions in causal inference. The efficient influence function (EIF) offers a principled way to construct estimators that can incorporate machine learning with valid inference (e.g., estimate valid conference intervals). In this Tutorial, we illustrate how to construct targeted maximum likelihood/minimum loss estimators (TMLE) from the EIF, a topic that is well-covered in statistical literature but remains less accessible to applied researchers. A companion paper, Renson et al. 2025 (AJE, kwaf169) provides a thorough, but approachable description of the EIF and its derivation for a statistical estimand.

2025-07-14 — Constructing Confidence Intervals for Infinite-Dimensional Functional Parameters by Highly Adaptive Lasso

Authors: Wenxin Zhang, Junming Shi, Alan Hubbard, M. V. D. Laan
Year: 2025
Publication Date: 2025-07-14
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso

Abstract:
Estimating the conditional mean function is a central task in statistical learning. In this paper, we consider estimation and inference for a nonparametric class of real-valued cadlag functions with bounded sectional variation (Gill et al., 1995), using the Highly Adaptive Lasso (HAL) (van der Laan, 2015; Benkeser and van der Laan, 2016; van der Laan, 2023), a flexible empirical risk minimizer over linear combinations of tensor products of zero- or higher-order spline basis functions under an L1 norm constraint. Building on recent theoretical advances in asymptotic normality and uniform convergence rates for higher-order spline HAL estimators, this work focuses on constructing robust confidence intervals for HAL-based estimators of conditional means. First, we propose a targeted HAL with a debiasing step to remove the regularization bias of the targeted conditional mean and also consider a relaxed HAL estimator to reduce such bias within the working model. Second, we propose both global and local undersmoothing strategies to adaptively enlarge the working model and further reduce bias relative to variance. Third, we combine these estimation strategies with delta-method-based variance estimators to construct confidence intervals for the conditional mean. Through extensive simulation studies, we evaluate different combinations of our estimation procedures, model selection strategies, and confidence-interval constructions. The results show that our proposed approaches substantially reduce bias relative to variance and yield confidence intervals with coverage rates close to nominal levels across different scenarios. Finally, we demonstrate the general applicability of our framework by estimating conditional average treatment effect (CATE) functions, highlighting how HAL-based inference methods extend to other infinite-dimensional, non-pathwise-differentiable parameters.

2025-07-14 — Causal Inference and Survey Data in Paediatric Epidemiology: Generalising Treatment Effects From Observational Data.

Authors: L. Burgos-Ochoa, Felix J. Clouth
Year: 2025
Publication Date: 2025-07-14
Venue: Paediatric and Perinatal Epidemiology
DOI: 10.1111/ppe.70042
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
BACKGROUND Survey data are essential in paediatric epidemiology, providing valuable insights into child health outcomes. The potential outcomes framework has advanced causal inference using observational data. However, traditional design-based adjustments, especially sample weights, are often overlooked. This omission limits the ability to generalise findings to the broader population. OBJECTIVE This study demonstrates three approaches for estimating the population average treatment effect (PATE) in a practical example, examining the impact of household second-hand smoke (SHS) exposure on blood pressure in school-aged children. METHODS Using data from the National Health and Nutrition Examination Survey (NHANES) 2017-2020, we assessed the effect of household SHS exposure, a non-randomised treatment, on blood pressure in school-aged children. We applied estimators based on Inverse Probability of Treatment Weighting (IPTW), G-computation, Targeted Maximum Likelihood Estimation (TMLE), and regression adjustment. Models without adjustments were run for comparison. We examined point estimates and the efficiency of the estimates obtained from these methods. RESULTS The largest differences were observed between the unadjusted regression models and the fully adjusted methods (IPTW, G-computation, and TMLE), which account for both confounding and survey weights. While the inclusion of the sample weights leads to wider confidence intervals for all methods, G-computation and TMLE showed comparatively narrower confidence intervals. Confidence intervals for the models not adjusted for sample weights were likely underestimated. CONCLUSIONS This study highlights the important role of sample weights in causal inference. Generalisability of the average treatment effect as estimated on data sampled using common survey designs to a defined population requires the use of sample weights. The estimators described provide a framework for incorporating sample weights, and their use in health research is recommended.

2025-07-11 — The Three-Dimensional Debordering of Language Policies by AI Translation: Theoretical Modeling and Multicultural Cases from Complex Adaptive Systems

Authors: Pengfei Bao
Year: 2025
Publication Date: 2025-07-11
Venue: Forum for Linguistic Studies
DOI: 10.30564/fls.v7i7.10085
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
This study employs Complex Adaptive Systems (CAS) theory to construct a Technology-Mediated Language Ecology (TMLE) framework, decoding how AI translation technologies restructure language policies through adaptive agent interactions and nonlinear emergence—two core principles of CAS. The TMLE model proposes a three-dimensional analytical framework (geospatial, socio-functional, and semiotic debordering), reimagining language policies as dynamic ecosystems where technological mediation and societal practices co-evolve. Taking the EU’s multilingual governanceand Egypt’s 2023 educational reform on dialectal Arabic as paradigmatic cases, the research demonstrates how adaptive agent coalitions—comprising governments, translation platforms (e.g., DeepL), and grassroots communities—collaboratively dismantle traditional policy boundaries. For instance, in the EU, neural machine translation (NMT) enabled a tripartite interaction among the European Commission, tech developers, and regional language activists, facilitating the rise of non-English languages in official domains. In Egypt, WhatsApp’s auto-transliteration tools, used by students and educators, compelling policymakers to recognize Egyptian Arabic (Masri) in digital literacy curricula, illustrating how bottom-up tech practices and institutional responses form a CAS-driven feedback loop. Through these cases, the study reveals that the traditional "territory-function" paradigm fails due to its static, linear logic, whereas the TMLE model—rooted in CAS’s principles of emergence and adaptive coordination—provides a robust framework for understanding tech-mediated language policy dynamics. The findings call for a shift from state-centric regulatory control to ecosystemic stewardship, where policies act as facilitators of adaptive linguistic networks rather than enforcers of rigid boundaries.

2025-07-09 — Potential source of bias in AI models: lactate measurement in the ICU in sepsis patients as a template

Authors: P. Pradhan, F. W. Haug, N. Abu Hussein, D. Moukheiber, Lama Moukheiber, M. Moukheiber, Sulaiman Moukheiber, L. Weishaupt, J. Ellen, H. D'Couto, I.C. Williams, L. A. Celi, J. Matos, T. Struja
Year: 2025
Publication Date: 2025-07-09
Venue: Frontiers in Medicine
DOI: 10.3389/fmed.2025.1606254
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Objective Health inequities may be driven by demographics such as sex, language proficiency, and race-ethnicity. These disparities may manifest through likelihood of testing, which in turn can bias artificial intelligence models. We aimed to evaluate variation in serum lactate measurements in the intensive care unit (ICU) in sepsis. Methods Utilizing MIMIC-IV (2008–2019), we identified adults fulfilling sepsis-3 criteria. Exclusion criteria were ICU stay < 1-day, unknown race-ethnicity, < 18 years of age, and recurrent ICU-stays. Employing targeted maximum likelihood estimation analysis, we assessed the likelihood of a lactate measurement on day 1. For patients with a measurement on day 1, we evaluated the predictors of subsequent readings. Results We studied 15,601 patients (19.5% racial-ethnic minority, 42.4% female, and 10.0% limited English proficiency). After adjusting for confounders, Black patients had a slightly higher likelihood of receiving a lactate measurement on day 1 [odds ratio 1.19, 95% confidence interval (CI) 1.06–1.34], but not the other minority groups. Subsequent frequency was similar across race-ethnicities, but women had a lower incidence rate ratio (IRR) 0.94 (95% CI 0.90–0.98). Patients with elective admission and private insurance also had a higher frequency of repeated serum lactate measurements (IRR 1.70, 95% CI 1.61–1.81 and 1.07, 95% CI, 1.02–1.12, respectively). Conclusion We found no disparities in the likelihood of a lactate measurement among patients with sepsis across demographics, except for a small increase for Black patients, and a reduced frequency for women. Subsequent analyses should account for the variation in biomarker monitoring being present in MIMIC-IV.

2025-07-01 — Performance of Cross‐Validated Targeted Maximum Likelihood Estimation

Authors: Matthew J Smith, Rachael V. Phillips, Camille Maringe, M. Luque-Fernández
Year: 2025
Publication Date: 2025-07-01
Venue: Statistics in Medicine
DOI: 10.1002/sim.70185
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Advanced methods for causal inference, such as targeted maximum likelihood estimation (TMLE), require specific convergence rates and the Donsker class condition for valid statistical estimation and inference. In situations where there is no differentiability due to data sparsity or near‐positivity violations, the Donsker class condition is violated. In such instances, the bias of the targeted estimand is inflated, and its variance is anti‐conservative, leading to poor coverage. Cross‐validation of the TMLE algorithm (CVTMLE) is a straightforward, yet effective way to ensure efficiency, especially in settings where the Donsker class condition is violated, such as random or near‐positivity violations. We aim to investigate the performance of CVTMLE compared to TMLE in various settings.

2025-07-01 — Advancing Computer‐Assisted Diabetic Retinopathy Grading: A Super Learner Ensemble Technique for Fundus Imagery

Authors: Mili Rosline Mathews, M. AnzarS.
Year: 2025
Publication Date: 2025-07-01
Venue: International journal of imaging systems and technology (Print)
DOI: 10.1002/ima.70152
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Diabetic retinopathy (DR) is a severe complication of diabetes mellitus and is a predominant global cause of blindness. The accuracy of DR grading is of paramount importance to enable timely and appropriate clinical interventions. This study presents an innovative and comprehensive approach to DR grading that combines convolutional neural networks with an ensemble of diverse machine learning algorithms, referred to as a super learner ensemble. Our methodology includes a preprocessing pipeline designed to enhance the quality of the fundus images in the dataset. To further refine DR grading, we introduce a novel feature extraction model named “RetinaXtract” in conjunction with advanced machine learning classifiers. Statistical analysis tools, specifically the Friedman and Nemenyi tests, are employed to identify the most effective machine learning algorithms. Subsequently, a super learner ensemble is devised by integrating the predictions of the highest‐performing machine learning algorithms. This ensemble approach captures a wide range of patterns, thereby enhancing the system's ability to accurately distinguish between different DR stages. Notably, accuracy rates of 99.64%, 99.51%, and 99.16% are achieved on the IDRiD, Kaggle, and Messidor datasets, respectively. This research represents a significant contribution to the field of DR grading, offering a balanced, efficient, and precise classification solution. The introduced methodology has demonstrated substantial promise and holds significant potential for practical applications in the detection and grading of DR from fundus images, ultimately leading to improved clinical outcomes in ophthalmology.

2025-06-26 — Two-stage targeted minimum-loss based estimation for non-negative two-part outcomes

Authors: Nicholas T Williams, Richard Liu, Katherine L. Hoffman, Sarah Forrest, Kara E. Rudolph, Iván Díaz
Year: 2025
Publication Date: 2025-06-26
Venue: Statistical Methods in Medical Research
DOI: 10.1177/09622802251340245
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Non-negative two-part outcomes are defined as outcomes with a density function that have a zero point mass but are otherwise positive. Examples, such as healthcare expenditure and hospital length of stay, are common in healthcare utilization research. Despite the practical relevance of non-negative two-part outcomes, few methods exist to leverage knowledge of their semicontinuity to achieve improved performance in estimating causal effects. In this paper, we develop a nonparametric two-stage targeted minimum-loss based estimator (denoted as hTMLE) for non-negative two-part outcomes. We present methods for a general class of interventions, which can accommodate continuous, categorical, and binary exposures. The two-stage TMLE uses a targeted estimate of the intensity component of the outcome to produce a targeted estimate of the binary component of the outcome that may improve finite sample efficiency. We demonstrate the efficiency gains achieved by the two-stage TMLE with simulated examples and then apply it to a cohort of Medicaid beneficiaries to estimate the effect of chronic pain and physical disability on days’ supply of opioids.

2025-06-26 — Estimating average causal effects with incomplete exposure and confounders

Authors: Lan Wen, Glen McGee
Year: 2025
Publication Date: 2025-06-26
DOI: 10.1515/jci-2023-0083
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract Standard methods for estimating average causal effects require complete observations of the exposure and confounders. In observational studies, however, missing data are ubiquitous. Motivated by a study on the effect of prescription opioids on mortality, we propose methods for estimating average causal effects when exposures and potential confounders may be missing. We consider missingness at random and additionally propose several specific missing not at random (MNAR) assumptions. Under our proposed MNAR assumptions, we show that the average causal effects are identified from the observed data and derive corresponding influence functions, which form the basis of our proposed estimators. Our simulations show that standard multiple imputation techniques paired with a complete data estimator is unbiased when data are missing at random (MAR) but can be biased otherwise. For each of the MNAR assumptions, we instead propose doubly robust targeted maximum likelihood estimators (TMLE), allowing misspecification of either (i) the outcome models or (ii) the exposure and missingness models. The proposed methods are suitable for any outcome types, and we apply them to a motivating study that examines the effect of prescription opioid usage on all-cause mortality using data from the National Health and Nutrition Examination Survey (NHANES).

2025-06-26 — Causal inference via implied interventions

Authors: Carlos Garc'ia Meixide, Mark J. van der Laan
Year: 2025
Publication Date: 2025-06-26
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso

Abstract:
In the context of having an instrumental variable, the standard practice in causal inference begins by targeting an effect of interest and proceeds by formulating assumptions enabling identification of this effect. We turn this around by simply not making assumptions anymore and just adhere to the interventions we can identify, rather than starting with a desired causal estimand and imposing untestable hypotheses. The randomization of an instrument and its exclusion restriction define a class of auxiliary stochastic interventions on the treatment that are implied by stochastic interventions on the instrument. This mapping effectively characterizes the identifiable causal effects of the treatment on the outcome given the observable probability distribution, leading to an explicit transparent G-computation formula under hidden confounding. Alternatively, searching for an intervention on the instrument whose implied one best approximates a desired target -- whose causal effect the user aims to estimate -- naturally leads to a projection on a function space representing the closest identifiable treatment effect. The generality of this projection allows to select different norms and indexing sets for the function class that turn optimization into different estimation procedures with the Highly Adaptive Lasso. This shift from identification under assumptions to identification under observation redefines how the problem of causal inference is approached.

2025-06-20 — Regularized Targeted Maximum Likelihood Estimation in Highly Adaptive Lasso Implied Working Models

Authors: Yi Li, Sky Qiu, Zeyi Wang, M. V. D. Laan
Year: 2025
Publication Date: 2025-06-20
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso, targeted maximum likelihood estimation, tmle

Abstract:
We address the challenge of performing Targeted Maximum Likelihood Estimation (TMLE) after an initial Highly Adaptive Lasso (HAL) fit. Existing approaches that utilize the data-adaptive working model selected by HAL-such as the relaxed HAL update-can be simple and versatile but may become computationally unstable when the HAL basis expansions introduce collinearity. Undersmoothed HAL may fail to solve the efficient influence curve (EIC) at the desired level without overfitting, particularly in complex settings like survival-curve estimation. A full HAL-TMLE, which treats HAL as the initial estimator and then targets in the nonparametric or semiparametric model, typically demands costly iterative clever-covariate calculations in complex set-ups like survival analysis and longitudinal mediation analysis. To overcome these limitations, we propose two new HAL-TMLEs that operate within the finite-dimensional working model implied by HAL: Delta-method regHAL-TMLE and Projection-based regHAL-TMLE. We conduct extensive simulations to demonstrate the performance of our proposed methods.

2025-06-18 — Causal machine learning to investigate the effectiveness of vancomycin as the empiric treatment choice in patients hospitalized with community acquired pneumonia

Authors: Fabrizio Pecoraro, Scott A. Cohen, M. Prosperi
Year: 2025
Publication Date: 2025-06-18
Venue: IEEE International Conference on Healthcare Informatics
DOI: 10.1109/ICHI64645.2025.00035
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Community-acquired pneumonia (CAP) is a significant cause of mortality among older adults. The early initiation of appropriate empiric antimicrobial treatment is crucial, yet the role of vancomycin, especially regarding its impact on patient outcomes, is controversial. This study investigates the effects of vancomycin on 30-day mortality and acute kidney failure in older adults hospitalized with CAP using observational data. Statistical analyses included causal random forests to identify treatment effect heterogeneity and targeted maximum likelihood estimation (TMLE) for estimating average treatment effects. Vancomycin use was linked to higher 30-day mortality as well as acute kidney failure, in particular in patients on mechanical ventilation. This study underscores the necessity of re-evaluating empirical antibiotic strategies in CAP, advocating for a more tailored approach to minimize unnecessary risks.

2025-06-17 — Local anesthesia is associated with better functional outcomes than conscious sedation in endovascular thrombectomy for acute ischemic stroke: A retrospective analysis of the OPTIMISE registry.

Authors: G. Mendes, Alexandre Y. Poppe, S. Verreault, Alexander Khaw, Richard H. Swartz, Darren Ferguson, Aditya Bharatha, G. Medvedev, D. Volders, G. Stotts, A. Katsanos, G. Jacquin
Year: 2025
Publication Date: 2025-06-17
Venue: Interventional Neuroradiology
DOI: 10.1177/15910199251349662
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
IntroductionThere are several possible anesthetic strategies during endovascular therapy (EVT) for acute ischemic stroke (AIS), including general anesthesia (GA), conscious sedation (CS), and local anesthesia (LA). While randomized trials have not shown a clear advantage of GA or CS, LA remains understudied. We aimed to determine if LA is associated with better functional outcomes compared to CS in a Canadian EVT registry.Patients and MethodsA retrospective analysis of the OPTIMISE registry was conducted, focusing on adult patients with anterior circulation AIS treated with EVT between January 2018 and December 2021. Patients with available information regarding anesthetic modality and 3-month functional outcome were included. The primary endpoint was a favorable functional outcome at 3 months (defined as a modified Rankin Scale score of 0-2) when using LA compared to CS (average treatment effect [ATE] determined by targeted maximum likelihood estimation). Secondary outcomes included procedural time, favorable reperfusion, complications, and symptomatic intracranial hemorrhage.ResultsA total of 2204 patients were included in the analysis (763 LA, 1441 CS). In the LA group, 57.5% (n = 439) had a favorable outcome at 3 months compared to 55.6% (n = 801) in the CS group (ATE 0.04 [0.00-0.07]; adjusted odds ratio 1.16 [1.01-1.34]; p = 0.04). No significant difference was found between groups regarding reperfusion rates, procedural times, and symptomatic intracranial hemorrhage.ConclusionIn this large, Canadian multicenter cohort of patients undergoing EVT for anterior circulation AIS, LA was safe and led to better functional outcomes at 3 months compared to CS. Given its simplicity and potential benefits, LA warrants greater consideration in clinical practice and inclusion as a treatment arm in future randomized controlled trials studying the optimal anesthetic strategy for EVT.

2025-06-16 — DBNX: A Machine Learning Method for Ensembling Polygenic Risk Scores and Non-Genetic Factors

Authors: Xiangzhe Yuan, Chonghao Wang, Shuqin Zhu, Lu Zhang
Year: 2025
Publication Date: 2025-06-16
Venue: IEEE Transactions on Computational Biology and Bioinformatics
DOI: 10.1109/TCBBIO.2025.3580193
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Polygenic risk scoring (PRS) holds promise for improving disease prediction and medical treatments by evaluating an individual’s genetic susceptibility through multiple genetic variants. However, current PRS calculation methods often excel only in specific diseases and populations, with no single approach consistently outperforming others across all contexts. Furthermore, these methods frequently overlook non-genetic factors, such as lifestyle, that also impact disease risk.We introduce an unsupervised Deep Belief Network (DBN) to aggregate PRS generated by various methods, achieving performance comparable to the Super Learner method—a supervised ensemble approach that combines predictions from multiple methods to improve outcomes. Unlike supervised methods, the DBN does not require training data and can directly ensemble the available PRS. Remarkably, on small-scale datasets, the DBN outperforms the Super Learner. Additionally, we present the DBNX model, which integrates PRS with non-genetic factors using a combination of DBN and XGBoost. DBNX produces a Composite Risk Score (CRS) that incorporates information from both PRS and non-genetic factors. In our experiments using the U.K. Biobank (UKBB) dataset across four diseases, DBNX demonstrated superior performance compared to other commonly used ensemble methods.

2025-06-13 — 111-OR: Assessing the Diabetic Eye Disease Care Continuum—Screening Gaps and Predictors among Type 2 Diabetes Mellitus Patients at Two Academic Medical Centers within the University of California Health System

Authors: Aryan Ayati, Stella Ko, Nicole Bonine, David Tabano, Nina Malik, Shadera Azzam, Cobi Ben-David, Michelle Wang, Frank Brodie, Mitul C Mehta, Vivek A Rudrapatna
Year: 2025
Publication Date: 2025-06-13
Venue: Diabetes
DOI: 10.2337/db25-111-or
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Introduction and Objective: The American Diabetes Association recommends annual eye exams for adults with Diabetes Mellitus Type 2 (T2DM); however, barriers to timely screening persist. This study assesses the diabetic eye care continuum among patients with T2DM at the University of California, San Francisco (UCSF) and Irvine (UCI). Methods: Electronic health records at UCSF and UCI were queried to identify adult patients with T2DM seen by a primary care provider (PCP) from 2020 through the end of 2022. We estimated the proportion of screening-eligible patients referred to a specialist and those who completed a screening visit within 12 months as well as those diagnosed with and treated for DR. Significant predictors of referrals and screening visits were identified using adjusted logistic regression. Targeted Maximum Likelihood Estimation (TMLE) estimated the impact of an automated referral system. Results: The patient population needing diabetic eye screening included 2,612 patients with T2DM at UCSF and 5,661 at UCI. UCSF had higher 1-year rates of referrals (53.6% vs. 37.4%) and eye screening visits (21.0% vs. 13.4%) (p<0.001). DR diagnosis prevalence was 3.6% at UCSF and 3.4% at UCI, with treatment rates at 0.7% and 1.0%, respectively. Referrals (OR 57.3[47.5-77.2]), previous eye diseases (OR 6.6[5.8-7.4]), and Charlson comorbidity index (OR 1.2[1.1-1.9]) were associated with higher screening rates. A higher Area Deprivation Index (OR 0.8[0.8, 0.9]) was associated with a lower likelihood of screening visits. TMLE suggested screening rates could improve to 34% at UCSF and 24% at UCI with automated referrals. Conclusion: Most patients with T2DM did not receive timely diabetic eye disease screening. Implementing an automated referral system could significantly boost screening rates, but gaps may persist. Further research is needed to understand these gaps and design interventions to improve eye health. A. Ayati: Research Support; Genentech, Inc. S. Ko: Employee; Genentech, Inc. N.G. Bonine: None. D. Tabano: Employee; Genentech, Inc. N. Malik: Employee; Genentech, Inc. S. Azzam: None. C. Ben-David: None. M. Wang: Research Support; BeiGene, Amgen Inc. F. Brodie: Consultant; Genentech, Inc. Research Support; Genentech, Inc. Advisory Panel; Genentech, Inc, Apellis. M. Mehta: Research Support; Genentech, Inc. Speaker's Bureau; Astellas Pharma Inc. Consultant; Ani. Research Support; Zeiss. Stock/Shareholder; Eyedaptic. Speaker's Bureau; Apellis. Research Support; jCyte. V. Rudrapatna: Research Support; Genentech, Inc, Merck & Co., Inc, Janssen Pharmaceuticals, Inc, Stryker, Mitsubishi Tanabe Pharma Corporation, Blueprint Medicines, Beigene. Consultant; Ironwood, Natera. Advisory Panel; ZebraMD, DataUnite, Acucare.

2025-06-06 — When Measurement Mediates the Effect of Interest

Authors: Joy Zora Nakato, Janice Litunya, B. Beesiga, J. Kabami, J. Ayieko, M. Kamya, G. Chamie, L.B Balzer
Year: 2025
Publication Date: 2025-06-06
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation, tmle

Abstract:
Many health promotion strategies aim to improve reach into the target population and outcomes among those reached. For example, an HIV prevention strategy could expand the reach of risk screening and the delivery of biomedical prevention to persons with HIV risk. This setting creates a complex missing data problem: the strategy improves health outcomes directly and indirectly through expanded reach, while outcomes are only measured among those reached. To formally define the total causal effect in such settings, we use Counterfactual Strata Effects: causal estimands where the outcome is only relevant for a group whose membership is subject to missingness and/or impacted by the exposure. To identify and estimate the corresponding statistical estimand, we propose a novel extension of Two-Stage targeted minimum loss-based estimation (TMLE). Simulations demonstrate the practical performance of our approach as well as the limitations of existing approaches.

2025-06-03 — Causal Inference with Missing Exposures and Missing Outcomes

Authors: Kirsten E. Landsiedel, Rachel Abbott, Atukunda Mucunguzi, F. Mwangwa, E. Kakande, Edwin D. Charlebois, Carina Marquez, M. Kamya, L.B Balzer
Year: 2025
Publication Date: 2025-06-03
Link: Semantic Scholar
Matched Keywords: super learner, tmle

Abstract:
Missing data are ubiquitous in public health research. When estimating causal effects, there are well-established methods to address bias to due censored outcomes. Commonly, causal estimands are defined under hypothetical interventions to"set"the exposure and to prevent censoring. Identification is evaluated with the sequential backdoor criterion and considerations of data support. Then inverse weighting, standardization, and doubly-robust approaches are applied for statistical estimation and inference. We demonstrate how this framework can be extended to settings with missingness on the exposure of interest as well as the variable defining the population of interest (e.g., persons at risk of the outcome). Our work is motivated by SEARCH-TB's investigation of the effect of alcohol consumption on the risk of incident tuberculosis (TB) infection in rural Uganda. This study posed several real-world challenges: confounding, missingness on the exposure (alcohol use), missingness on the baseline outcome (defining who was at risk of TB), and missingness on the outcome at follow-up (capturing who acquired TB). We present a series of causal models and identification results to demonstrate the handling of missing exposures and outcomes in prospective studies. We highlight the use of TMLE with Super Learner and the real-world consequences of our approach.

2025-06-01 — P-701 Leveraging explainable Machine Learning to predict monitoring variables during ovarian stimulation for optimal timing of the trigger day

Authors: M. Dellenbach, J. Julie, G. Dormion, H. Bonneau-Chloup, C. Yazbeck, C. Rongieres
Year: 2025
Publication Date: 2025-06-01
Venue: Human Reproduction
DOI: 10.1093/humrep/deaf097.1007
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
How can AI models predicting monitoring variables, such as hormone levels and follicle size, be used to determine the optimal trigger day for ovarian stimulation? A robust methodology enables accurate prediction of monitoring variables, helping doctors make informed decisions and select the optimal trigger day for ovarian stimulation effectively. Efforts to standardize the optimal trigger day for ovarian stimulation have yet to achieve consensus, with practices varying widely across countries and even among clinics within the same nation. Existing machine learning models address this issue but often consider a limited range of days, restricting their practical applicability. To address these limitations, we leverage causal inference techniques to estimate optimal trigger day through rigorous statistical modeling. Unlike traditional models, causal inference methods evaluate all possible scenarios while providing clear insights into the underlying mechanisms, offering a transparent, robust and data-driven approach to improve decision-making in ovarian stimulation protocols. In a retrospective multi-center study, we analyzed data from 18,285 IVF ovarian stimulation cycles between 2012 and 2024 in 3 French fertility clinics. To estimate the causal effect of trigger day on the number of follicles over 14 mm, we selected a subset of confounding covariates, including age, Body Mass Index (BMI) and monitoring variables throughout the stimulation period. This approach aims to better understand factors influencing trigger day and follicular development. We used a Super Learner model, updated throughout the stimulation with monitoring data, to predict follicles over 14 mm (F14). Six models are trained sequentially: the first on pre-stimulation data to predict F14 from days 5 to 16. Each subsequent model incorporates monitoring data from days 5 to 8, predicting F14 from the next day onward up to day 16. This iterative approach refines predictions using evolving data, enhancing accuracy throughout the stimulation period. We present the results of the first Super Learner model, which does not yet incorporate monitoring data. The peak estimation of 8.5 follicles >14mm occurred on day 12. To estimate prediction uncertainty, we used conformal prediction, which provides reliable confidence intervals, ensuring accurate and trustworthy uncertainty estimates. The average predicted number of F14 per day from day 5 to 16 was 0.3 (±0.8), 0.5 (±1.3), 1.8 (±3.1), 3.3 (±4.6), 6.6 (±5.9), 7.9 (±6.7), 8.3 (±6.9), 8.5 (±6.8), 8.1 (±6.4), 7.7 (±6.1), 7.6 (±5.3), and 7.1 (±5.5), respectively, with a coverage rate of 89% (SD 9%). For prediction accuracy, we used the Mean Absolute Error (MAE), with values of 0.46, 0.66, 1.48, 2.47, 2.76, 2.93, 2.90, 3.24, 2.78, 3.19, 2.56, and 3.43, respectively. Those predictions of the number of follicles over 14mm as an indicator for follicle maturity can be used by clinicians to anticipate the optimal trigger day. We need to reduce the width of confidence intervals while incorporating the number of daily observations into our approach. Our goal is to ensure the interval size does not exceed the mean prediction. We also aim to better acknowledge biases, especially regarding the day of the trigger. This methodology can be applied to other monitoring variables, helping clinicians anticipate the optimal trigger day. Expanding the study with data from other fertility centers with patients and treatments variability will enhance external validity. Prospective testing of algorithm recommendations and adherence is crucial to assess real-world effectiveness across different populations. No

2025-06-01 — Machine learning-aided prediction of COD removal in the electrocoagulation process using a super learner model

Authors: Mhd Taisir Albaba, Mohammed Talhami, Abdullah Omar, Sumith Varghese, Rayane Akoumeh, M. A. Ayari, Probir Das, Ali Altaee, Maryam Al‐Ejji, Alaa H. Hawari
Year: 2025
Publication Date: 2025-06-01
Venue: Journal of Environmental Chemical Engineering
DOI: 10.1016/j.jece.2025.117469
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2025-06-01 — Machine learning ensemble meets clinical practice: developing a real-world risk prediction model for metabolic syndrome using super learner and scorecard approaches.

Authors: Shuwen Li, Yu Zhang, Kang Fu, Kailu Fang, Luyan Zheng, Yushi Lin, Jie Wu
Year: 2025
Publication Date: 2025-06-01
Venue: Journal of Advanced Research
DOI: 10.1016/j.jare.2025.06.072
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2025-06-01 — Ensemble Super Learner Model for predicting strain at maximum stress and energy absorption capacity of UHPFRC

Authors: Seunghye Lee, J. Abellán-García, T. Vo, Trung-Kien Nguyen
Year: 2025
Publication Date: 2025-06-01
Venue: Journal of Building Engineering
DOI: 10.1016/j.jobe.2025.113001
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2025-06-01 — Community health worker–facilitated telehealth for moderate–severe hypertension care in Kenya and Uganda: A randomized controlled trial

Authors: Matthew D Hickey, A. Owaraganise, Sabina Ogachi, N. Sang, Erick M Wafula, J. Kabami, Nicole Sutter, Jennifer Temple, Anthony Muiru, G. Chamie, E. Kakande, Maya Petersen, L.B Balzer, D. Havlir, M. Kamya, J. Ayieko
Year: 2025
Publication Date: 2025-06-01
Venue: PLoS Medicine
DOI: 10.1371/journal.pmed.1004632
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Background Hypertension is underdiagnosed and undertreated in sub-Saharan Africa. Improving hypertension treatment within primary health centers can improve cardiovascular disease outcomes; however, individuals with moderate–severe hypertension face additional barriers to care, including the need for frequent clinic visits to titrate medications. We conducted a pilot study to test whether a clinician-driven, community health worker (CHW)–facilitated telehealth intervention would improve hypertension control among adults with severe hypertension in rural Uganda and Kenya. Methods and findings We conducted a pilot randomized controlled trial (RCT) of hypertension treatment delivered via telehealth by a clinician (adherence assessment, counseling, decision-making) and facilitated by a CHW in the participant’s home, compared to clinic-based hypertension care (NCT04810650). We recruited adults ≥40 years with BP ≥ 160/100 mmHg at household screening by CHWs, with no restrictions by HIV status. After initial evaluation at the clinic, participants were randomized to telehealth or clinic-based hypertension follow-up. Randomization assignment was not blinded, except for the study statistician. All participants were treated using standard country guideline-based antihypertensive drugs. The primary outcome was hypertension control at 24 weeks (BP < 140/90 mmHg). We also assessed hypertension control at 48 weeks. In intention-to-treat analyses, we compared outcomes between randomized arms with targeted minimum loss-based estimation using sample-splitting to select optimal adjustment covariates (candidates: age, sex, baseline hypertension severity, and country). We screened 2,965 adults ≥40 years, identifying 266 (9%) with severe hypertension and enrolling 200 (98 telehealth arms, 102 clinic arms). Participants were 67% women, median age of 62 years (Q1–Q3 51–72); 14% with HIV. Week 24 blood pressure was measured in 96/99 intervention and 99/102 control participants; week 24 hypertension control was 77% in telehealth and 51% in clinic arms (risk difference (RD) 26%, 95% confidence interval (CI) [14%, 38%], p < 0.001). Week 48 hypertension control was 86% in telehealth and 44% in clinic arms (RD 42%, 95% CI [30%, 53%], p < 0.001). Three participants died (telehealth: 2, clinic: 1); all deaths were unrelated to the study interventions. Our study was limited by its small sample size, although findings are strengthened by being conducted in three primary health centers across two countries. Conclusion In this pilot, RCT, clinician-driven, CHW-facilitated telehealth for hypertension management improved hypertension control and reduced severe hypertension compared to clinic-based care. Telehealth focused on individuals with moderate–severe hypertension is a promising approach to improve outcomes among those with the highest risk for CVD.

2025-05-30 — Optimizing personalized screening intervals for clinical biomarkers using extended joint models

Authors: N. Mchunu, Henry Mwambi, Tarylee Reddy, N. Yende-Zuma, D. Rizopoulos
Year: 2025
Publication Date: 2025-05-30
Venue: Journal of Applied Statistics
DOI: 10.1080/02664763.2025.2505636
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
This research advances joint modeling and personalized scheduling for HIV and TB by incorporating censored longitudinal outcomes in multivariate joint models, providing a more flexible and accurate approach for complex data scenarios. Inspired by the SAPiT study, we deviate from standard model selection procedures by using super learning techniques to identify the optimal model for predicting future events in event-free subjects. Specifically, the Integrated Brier score and Expected Predictive Cross-Entropy (EPCE) identified the multivariate joint model with the parameterization of the area under the longitudinal profiles of CD4 count and viral load as optimal and strong predictors of death. Integrating this model with a risk-based screening strategy, we recommend extending intervals to 10.3 months for stable patients, with additional measurements every 12 months. For patients with deteriorating health, we suggest a 3.5-month interval, followed by 6.2 months, and then annual screenings. These findings refine patient care protocols and advance personalized medicine in HIV/TB co-infected individuals. Furthermore, our approach is adaptable, allowing adjustments based on patients' evolving health status. While focused on HIV/TB co-infection, this method has broader applicability, offering a promising avenue for biomarker studies across various disease populations and potential for future clinical trials and biomarker-guided therapies.

2025-05-29 — Estimating the Risk of Lower Extremity Complications in Adults Newly Diagnosed With Diabetic Polyneuropathy: Retrospective Cohort Study

Authors: Alyce S Adams, Catherine Lee, G. Escobar, Elizabeth Bayliss, Brian C. Callaghan, Michael A Horberg, J. Schmittdiel, Connie Trinacty, Lisa K. Gilliam, Eileen Kim, N. Hejazi, Lin Ma, Romain S. Neugebauer
Year: 2025
Publication Date: 2025-05-29
Venue: JMIR Diabetes
DOI: 10.2196/60141
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract Background Diabetes-related lower extremity complications, such as foot ulceration and amputation, are on the rise, currently affecting nearly 131 million people worldwide. Methods for early detection of individuals at high risk remain elusive. While data-driven diabetic polyneuropathy algorithms exist, high-performing, clinically useful tools to assess risk are needed to improve clinical care. Objective This study aimed to develop an electronic medical record–based machine learning algorithm that would predict lower extremity complications. Methods We conducted a retrospective longitudinal cohort study to predict the risk of lower extremity complications within 24 months of an initial diagnosis of diabetic polyneuropathy. From an initial cohort of 468,162 individuals with at least 1 diagnosis of diabetic polyneuropathy at one of 2 multispecialty health care systems (based in northern California and Colorado) between April 2012 and December 2016, we created an analytic cohort of 48,209 adults with continuous enrollment, who were newly diagnosed with no evidence of end-of-life care. The outcome was any lower extremity complication, including foot ulceration, osteomyelitis, gangrene, or lower extremity amputation. We randomly split the data into training (38,569/48209; 80%) and testing (9,640/48209; 20%) datasets. In the training dataset, we used super Learner (SL), an ensemble learning method that employs cross-validation and combines multiple candidate risk predictors, into a single risk predictor. We evaluated the performance of the SL risk predictor in the testing dataset using the receiver operating characteristic curve and a calibration plot. Results Of the 48,209 individuals in the cohort, 2327 developed a lower extremity complication during follow-up. The SL risk estimator exhibited good discrimination (AUC=0.845, 95% CI 0.826-0.863) and calibration. A modified version of our SL algorithm, simplified to facilitate real-world adoption, had only slightly reduced discrimination (AUC=0.817, 95%CI 0.797-0.837). The modified version slightly outperformed the naïve logistic regression model (AUC=0.804, 95% CI 0.783-0.825) in terms of precision gained relative to the frequency of alerts and number of patients that needed to be evaluated. Conclusions We have built a machine learning–based risk estimator with the potential to improve clinical detection of diabetic patients at high risk for lower extremity complications at the time of an initial diabetic polyneuropathy diagnosis. The algorithm exhibited good discriminant validity and calibration using only data from the electronic medical record. Additional research will be needed to identify optimal contexts and strategies for maximizing algorithmic fairness in both interpretation and deployment.

2025-05-28 — Racial and ethnic differences in the relationship of SARS-CoV-2 infection and the COVID-19 pandemic period with perinatal health in California.

Authors: Emily F Liu, Shelley Jung, Kara E Rudolph, M. Mujahid, William H Dow, Dana E. Goin, R. Morello-Frosch, Jennifer Ahern
Year: 2025
Publication Date: 2025-05-28
Venue: Epidemiology
DOI: 10.1097/EDE.0000000000001878
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
BACKGROUND In this paper, we test the hypothesis that SARS-CoV-2 infection and the COVID-19 pandemic period had stronger adverse implications for perinatal outcomes among marginalized racial and ethnic groups in California. METHODS We used California birth certificate and hospital data from 2019-2021 to estimate marginal risk differences for SARS-CoV-2 infection and the COVID-19 pandemic period in relation to perinatal outcomes for Asian, Black, Hispanic, Multiracial, and White pregnant people using targeted maximum likelihood estimation. RESULTS Among 849,401 deliveries, there were racial and ethnic disparities in the burden of SARS-CoV-2 infection and perinatal outcomes, and in the magnitudes of risk associated with SARS-CoV-2 infection and the COVID-19 pandemic. Hispanic pregnant people had the highest incidence of SARS-CoV-2 infection. Asian and Black pregnant people had the greatest marginal risk differences for multiple outcomes, particularly outcomes already disproportionately experienced by these groups. CONCLUSIONS Risks from SARS-CoV-2 infection and the COVID-19 pandemic period on perinatal outcomes were disproportionately experienced by marginalized racial and ethnic groups. Differential burdens of infection and larger risks experienced with pandemic exposures were associated with worse perinatal outcomes for Asian, Black, and Hispanic pregnant people in California compared with those for White pregnant people.

2025-05-20 — Statistical Methods for Chemical Mixtures: A Roadmap for Practitioners Using Simulation Studies and a Sample Data Analysis in the PROTECT Cohort

Authors: W. Hao, A. Cathey, Max M Aung, J. Boss, J. Meeker, B. Mukherjee
Year: 2025
Publication Date: 2025-05-20
Venue: Environmental Health Perspectives
DOI: 10.1289/EHP15305
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Background: Quantitative characterization of the health impacts associated with exposure to chemical mixtures has received considerable attention in current environmental and epidemiological studies. With many existing statistical methods and emerging approaches, it is important for practitioners to understand which method is best suited for their inferential goals. Objective: The goal of this paper is to provide empirical simulation-based evidence regarding performance of mixture methods to help guide researchers on selecting the best available methods to address three scientific questions in mixtures analysis: identifying important components of a mixture, identifying interactions among mixture components, and creating a summary score for risk stratification and prediction. Methods: We conducted a review and comparison of 11 analytical methods available for use in mixtures research through extensive simulation studies for continuous and binary outcomes. In addition, we carried out an illustrative data analysis using the PROTECT birth cohort from Puerto Rico to examine the associations between exposure to chemical mixtures—metals, polycyclic aromatic hydrocarbons (PAHs), phthalates, and phenols—and birth outcomes. Results: Our simulation results suggest that the choice of methods depends on the goal of analysis and that there is no clear winner across the board. For selection of important toxicants in the mixtures and for identifying interactions, Elastic net (Enet) by Zou et al., Lasso for Hierarchical Interactions (HierNet) by Bien et al., and selection of nonlinear interactions by a forward stepwise algorithm (SNIF) by Narisetty et al. have the most stable performance across simulation settings. For overall summary or a cumulative measure, we find that using the Super Learner to combine multiple environmental risk scores can lead to improved risk stratification and prediction properties. Conclusions: We develop an integrated R package “CompMix” that provides a platform for mixtures analysis where the practitioners can implement a pipeline that includes several approaches for mixtures analysis. Our study offers guidelines for selecting appropriate statistical methods for addressing specific scientific questions related to mixtures research. We identify critical gaps where new and better methods are needed. https://doi.org/10.1289/EHP15305

2025-05-16 — Analysis and Prediction of Mental Disorder using Ensemble Learning

Authors: K. Rani, Mohammad Saniya, E. K. Raj, Pratyush Kumar Nayak, K. Mahesh, G. R. Reddy
Year: 2025
Publication Date: 2025-05-16
Venue: 2025 International Conference on Advancements in Smart, Secure and Intelligent Computing (ASSIC)
DOI: 10.1109/ASSIC64892.2025.11158673
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Effective prediction tools for early intervention are required due to the rising incidence of mental health illnesses. With a focus on the Super Learner algorithm for increased accuracy, this research introduces a novel method for predicting mental diseases utilizing ensemble learning approaches. The suggested approach collects survey replies from respondents at different levels, each of which corresponds to a distinct set of questions that are given on different sessions. The algorithm analyzes the data and forecasts the person's mental health status based on the answers from these few sessions, offering tailored intervention suggestions. The Super Learner algorithm is used to integrate several machine learning models, and the model makes use of a large dataset to train and improve the prediction system. By adjusting the survey questions and adding more information, this method maximizes accuracy. According to the findings, there is encouraging potential for precise mental condition prediction, which could aid in early detection and lessen the stigma associated with mental health.

2025-05-15 — The effect of lifting eviction moratoria on fatal drug overdoses in the context of the COVID-19 pandemic in the US.

Authors: Ariadne Rivera-Aguirre, Iván Díaz, Giselle Routhier, Cameron C McKay, Ellicott C. Matthay, Samuel R. Friedman, Kely M Doran, Magdalena Cerdá
Year: 2025
Publication Date: 2025-05-15
Venue: American Journal of Epidemiology
DOI: 10.1093/aje/kwaf105
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Between May 2020 and December 2021, there were 159,872 drug overdose deaths in the US. Higher eviction rates have been associated with higher overdose mortality. Amid the economic turmoil caused by the COVID-19 pandemic, 43 states and Washington, DC, implemented eviction moratoria of varying durations. These moratoria reduced eviction filing rates, but their impact on fatal drug overdoses remains unexplored. We evaluated the effect of these policies on county-level overdose death rates by focusing on the dates the state eviction moratoria were lifted. We obtained mortality data from NCHS and eviction moratoria dates from the COVID-19 US State Policy Database. We employed a longitudinal targeted minimum-loss-based estimation with Super Learner to flexibly estimate the average treatment effect (ATE) of never lifting the moratoria. Lifting state eviction moratoria was associated with a 0.14 per 100,000 higher rate of monthly overdose mortality (95%CI: -0.03, 0.32), although confidence intervals were wide and included zero. Eviction moratoria may not be sufficient to prevent overdose mortality during crises such as the COVID-19 pandemic.

2025-05-15 — A hybrid super learner ensemble for phishing detection on mobile devices

Authors: Routhu Srinivasa Rao, Cheemaladinne Kondaiah, A. R. Pais, Bumshik Lee
Year: 2025
Publication Date: 2025-05-15
Venue: Scientific Reports
DOI: 10.1038/s41598-025-02009-8
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
In today’s digital age, the rapid increase in online users and massive network traffic has made ensuring security more challenging. Among the various cyber threats, phishing remains one of the most significant. Phishing is a cyberattack in which attackers steal sensitive information, such as usernames, passwords, and credit card details, through fake web pages designed to mimic legitimate websites. These attacks primarily occur via emails or websites. Several antiphishing techniques, such as blacklist-based, source code analysis, and visual similarity-based methods, have been developed to counter phishing websites. However, these methods have specific limitations, including vulnerability to zero-day attacks, susceptibility to drive-by-downloads, and high detection latency. Furthermore, many of these techniques are unsuitable for mobile devices, which face additional constraints, such as limited RAM, smaller screen sizes, and lower computational power. To address these limitations, this paper proposes a novel hybrid super learner ensemble model named Phish-Jam, a mobile application specifically designed for phishing detection on mobile devices. Phish-Jam utilizes a super learner ensemble that combines predictions from diverse Machine Learning (ML) algorithms to classify legitimate and phishing websites. By focusing on extracting features from URLs, including handcrafted features, transformer-based text embeddings, and other Deep Learning (DL) architectures, the proposed model offers several advantages: fast computation, language independence, and robustness against accidental malware downloads. From the experimental analysis, it is observed that the super learner ensemble achieved significant accuracy of 98.93%, precision of 99.15%, MCC of 97.81% and F1 Score of 99.07%.

2025-05-01 — On the Mechanistic Interpretability of Neural Networks for Causality in Bio-statistics

Authors: Jean-Baptiste A. Conan
Year: 2025
Publication Date: 2025-05-01
Venue: arXiv.org
DOI: 10.48550/arXiv.2505.00555
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation, tmle

Abstract:
Interpretable insights from predictive models remain critical in bio-statistics, particularly when assessing causality, where classical statistical and machine learning methods often provide inherent clarity. While Neural Networks (NNs) offer powerful capabilities for modeling complex biological data, their traditional"black-box"nature presents challenges for validation and trust in high-stakes health applications. Recent advances in Mechanistic Interpretability (MI) aim to decipher the internal computations learned by these networks. This work investigates the application of MI techniques to NNs within the context of causal inference for bio-statistics. We demonstrate that MI tools can be leveraged to: (1) probe and validate the internal representations learned by NNs, such as those estimating nuisance functions in frameworks like Targeted Minimum Loss-based Estimation (TMLE); (2) discover and visualize the distinct computational pathways employed by the network to process different types of inputs, potentially revealing how confounders and treatments are handled; and (3) provide methodologies for comparing the learned mechanisms and extracted insights across statistical, machine learning, and NN models, fostering a deeper understanding of their respective strengths and weaknesses for causal bio-statistical analysis.

2025-05-01 — How Effective Are Machine Learning and Doubly Robust Estimators in Incorporating High‐Dimensional Proxies to Reduce Residual Confounding?

Authors: Mohammad Ehsanul Karim, Yang Lei
Year: 2025
Publication Date: 2025-05-01
Venue: Pharmacoepidemiology and Drug Safety
DOI: 10.1002/pds.70155
Link: Semantic Scholar
Matched Keywords: super learner, targeted maximum likelihood estimation, tmle

Abstract:
Residual confounding presents a persistent challenge in observational studies, particularly in high‐dimensional settings. High‐dimensional proxy adjustment methods, such as the high‐dimensional propensity score (hdPS), are widely used to address confounding bias by incorporating proxies for unmeasured confounders. Extensions of hdPS have integrated machine learning, such as LASSO and super learner (SL), and doubly robust estimators, such as targeted maximum likelihood estimation (TMLE). However, the comparative performance of these methods, especially under different learner configurations and high‐dimensional proxies, remains unclear.

2025-05-01 — An efficient super learner model for predicting wettability of the hydrogen/mineral/brine system: Implication for hydrogen geo-storage

Authors: Menad Nait Amar, Mohamed Riad Youcefi, F. M. Alqahtani, Hakim Djema, Mohammad Ghasemi
Year: 2025
Publication Date: 2025-05-01
Venue: International journal of hydrogen energy
DOI: 10.1016/j.ijhydene.2025.03.450
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2025-04-30 — Assessing Racial Disparities in Healthcare Expenditures via Mediator Distribution Shifts

Authors: Xiaxian Ou, Xinwei He, D. Benkeser, Razieh Nabi
Year: 2025
Publication Date: 2025-04-30
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Racial disparities in healthcare expenditures are well-documented, yet the underlying drivers remain complex and require further investigation. This study develops a framework for decomposing such disparities through shifts in the distributions of mediating variables, rather than treating race itself as a manipulable exposure. We define disparities as differences in covariate-adjusted outcome distributions across racial groups, and decompose the total disparity into two components: one attributable to differences in mediator distributions, and another residual component that would remain even after equalizing these distributions. Using data from the Medical Expenditures Panel Survey, we examine the extent to which expenditure disparities would persist or be reduced if mediators such as socioeconomic status, insurance access, health behaviors, or health status were equalized across racial groups. To ensure valid inference, we derive asymptotically linear estimators based on influence-function techniques and flexible machine learning tools, including super learners and a two-part model designed for the zero-inflated, right-skewed nature of expenditure data.

2025-04-29 — Student-Centered Approach in Higher Education to Transform Learning in India – A New ISL Model

Authors: P. S. Aithal, Shubhrajyotsna Aithal
Year: 2025
Publication Date: 2025-04-29
Venue: Poornaprajna International Journal of Management Education &amp; Social Science (PIJMESS)
DOI: 10.64818/pijmess.3107.4626.0021
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Purpose: The purpose of this scholarly paper, "Student-Centered Approach in Higher Education to Transform Learning in India – A New ISL Model", is to conceptualize and propose an innovative educational framework—Integrated Super Learning (ISL) Model—that holistically enhances student development by unifying three critical dimensions: confidence boosting through skill-based learning, maturity development through ethical and value-based education, and competency building through technology and research integration. Grounded in both conceptual and analytical methodologies, the paper aims to fill the existing gaps in fragmented student-centered learning models by offering a comprehensive, scalable, and contextually relevant solution for Indian higher education institutions. Methodology: This study employs an exploratory research design, integrating conceptual and analytical frameworks to propose and validate the new Integrated Super Learning (ISL) model aimed at transforming student-centered learning in Indian higher education. The proposed model is analysed using appropriate analysis frameworks. Results/Analysis: The paper employs qualitative methods, including an extensive literature review and stakeholder consultations, to develop and validate the Integrated Super Learning (ISL) Model. To evaluate the feasibility and practicality of the ISL framework, two strategic tools are applied (1) SWOC Analysis (Strengths, Weaknesses, Opportunities, Challenges) – This is used to assess the internal and external factors that influence the implementation and effectiveness of the ISL model from institutional and academic standpoints. (2) ABCD Analysis (Advantages, Benefits, Constraints, Disadvantages) – framework provides a balanced examination of the ISL model’s holistic value and limitations in real-world educational environments. Originality/Value: The innovative educational framework—Integrated Super Learning (ISL) Model—that holistically enhances student development by unifying three critical dimensions: confidence boosting through skill-based learning, maturity development through ethical and value-based education, and competency building through technology and research integration. The analyses provide a comprehensive, evidence-based understanding of the model’s applicability in the Indian higher education system, highlighting both its transformative potential and practical challenges. Type of Paper: Exploratory, model based qualitative Analysis.

2025-04-24 — Synergistic improvements in mechanical and thermal performance of TiB2 solid-solution-based composites

Authors: Zhuang 壮 Li 李, Cun 存 You 由, Z. Li 李, Xuepeng 雪鹏 Li 李, Guiqian 贵乾 Sun 孙, Xinglin 星淋 Wang 王, Q. Jia 贾, Qiang 强 Tao 陶, P. Zhu 朱
Year: 2025
Publication Date: 2025-04-24
Venue: Chinese Physics B
DOI: 10.1088/1674-1056/add00c
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Continuously improving the mechanical properties of ultra-high-temperature ceramics (UHTCs) is a key requirement for their future applications. However, the mechanical properties of conventional UHTCs, HfB2 and ZrB2, remain unsatisfactory among transition metal light-element (TMLE) compounds. TiB2 has superior mechanical properties compared to both HfB2 and ZrB2, but suffers from inherent brittleness and limited oxidation resistance. In this work, low-content solid-solution strengthening was used to fabricate dense samples of Tix(Hf/Zr)1−xB2 (THZ) under high pressure and high temperature (HPHT). Compared to pure TiB2, Ti0.94(Hf/Zr)0.06B2 exhibits a significant 38.8% increase in oxidation resistance temperature (950 °C), while Ti0.91(Hf/Zr)0.09B2 shows a notable 28% enhancement in fracture toughness (5.8 MPa⋅m1/2). The synergistic effect of a dual-atom solid-solution results in local internal stress and anomalous lattice contraction. This lattice contraction helps resist oxygen invasion, thereby elevating the oxidation resistance threshold. Additionally, the internal stress induces crack deflection within individual grains, enhancing toughness through energy dissipation. This work provides a new strategy for fabricating robust UHTCs within TMLE systems, demonstrating significant potential for future high-temperature applications.

2025-04-23 — COVID-19 Vaccination Timing, Relative to Acute COVID-19, and Subsequent Risk of Long COVID

Authors: Zachary Butzin-Dozier, Yunwen Ji, Lin-Chiun Wang, A. Anzalone, Jeremy Coyle, Rachael V. Phillips, Rena C Patel, Jing Sun, Eric Hurwitz, Sarang Deshpande, Junming Shi, Andrew Mertens, Mark van der Laan, J. Colford, Alan E. Hubbard
Year: 2025
Publication Date: 2025-04-23
Venue: medRxiv
DOI: 10.1101/2025.04.22.25326224
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Objectives: Long COVID is a debilitating condition that impacts millions of Americans, but patients and clinicians have little information on how to prevent this disorder. Vaccination is a vital tool in preventing acute COVID-19 and may confer additional protection against Long COVID. There is limited evidence regarding the optimal timing of COVID-19 vaccination (i.e., vaccination schedule) to minimize the risk of Long COVID. Methods: We applied Longitudinal Targeted Maximum Likelihood Estimation to electronic health record (EHR) data from a retrospective cohort of patients vaccinated against COVID-19 between December 2021 and September 2022. We evaluated the association between binary COVID-19 vaccination status (two or more doses vs. zero doses) and 12-month Long COVID risk among patients diagnosed with acute COVID-19 between December 2021 and September 2022. In addition, we compared the 12-month cumulative risk of Long COVID (ICD-10 code U09.9) among patients diagnosed with acute COVID-19 one to three months after vaccination, three to five months after vaccination, or five to seven months after vaccination while adjusting for relevant high-dimensional baseline and time-dependent covariates. Results: We analyzed EHR data from a retrospective cohort of 1,558,018 patients. In our binary cohort (n = 519,980), we found that vaccinated patients had a lower risk of Long COVID than unvaccinated patients (adjusted marginal risk ratio 0.84 (0.81, 0.88)). In our longitudinal cohort (n = 1,085,291), we did not find a significant difference in Long COVID risk comparing patients who were diagnosed with acute COVID-19 one to three months after vaccination versus patients who were diagnosed with COVID-19 three to five months (adjusted marginal risk ratio 0.93 (95% CI 0.62, 1.41) or 5 to 7 months (adjusted marginal risk ratio 1.06 (95% CI 0.72, 1.56)) after vaccination. Conclusions: We found that COVID-19 vaccination before SARS-CoV-2 infection was protective against Long COVID, and we did not find that this protection significantly waned within 7 months after vaccination. These findings suggest that COVID-19 vaccination protects against Long COVID.

2025-04-18 — The optimal spatial averaging method for random/non-random missing data via super learner and its application to Tara Oceans data

Authors: Aixian Chen, Xiaoyun Huang, Xia Cui
Year: 2025
Publication Date: 2025-04-18
Venue: Stochastic environmental research and risk assessment (Print)
DOI: 10.1007/s00477-025-02972-8
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2025-04-12 — COMPARATIVE EFFECTIVENESS OF PROPENSITY SCORE ESTIMATION METHODS FOR INVERSE PROBABILITY OF TREATMENT WEIGHTING ANALYSIS WITH COMPLEX SURVEY DATA: A SIMULATION STUDY.

Authors: Lihua Li, Chen Yang, Liangyuan Hu, Wei Zhang, Melissa Aldridge, Bian Liu, Madhu Mazumdar
Year: 2025
Publication Date: 2025-04-12
Venue: Journal of Survey Statistics and Methodology
DOI: 10.1093/jssam/smaf003
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Propensity score (PS) methods, including inverse probability of treatment weighting (IPTW) analysis, are increasingly applied to complex survey data in geriatric studies to infer causal effects. However, the comparative effectiveness of various PS estimation methods, particularly novel machine learning algorithms, has not been thoroughly explored when complex survey data are involved. We conducted a comprehensive simulation study to compare the following six PS estimation methods in IPTW analysis: Logistic Regression, Covariate Balancing Propensity Score, Generalized Boosted Model, Classification and Regression Tree, Random Forest (RF), and Super Learner. We considered 12 scenarios with varying treatment effects, degrees of non-linearity and non-additivity in the associations between covariates and the exposure, and levels of PS overlap. The performance of these six methods was assessed in terms of mean relative bias, root mean square error, and coverage probability. The results showed a similar performance across all methods when PS overlap was strong. However, RF consistently outperformed the other methods when PS overlap was not strong and under non-additive and non-linear scenarios. The results suggest RF to be a more effective approach for PS estimation than the other proposed methods when applying IPTW analysis to complex survey data for population average treatment effects. The methods were applied to data from the Medicare Beneficiary Current Survey for years 2002-2019 to estimate the impact of hospice use on end-of-life healthcare costs. Findings from the real-world example show that hospice use was significantly associated with reduced end-of-life healthcare costs of Medicare Beneficiaries.

2025-04-11 — ALVAC-prime and monomeric gp120 protein boost induces distinct HIV-1 specific humoral and cellular responses compared with adenovirus-prime and trimeric gp140 protein boost

Authors: Leigh H Fisher, E. Lazarus, Chenchen Yu, Z. Moodie, D. Stieh, N. Yates, Lu Zhang, Sheetal S Sawant, Stephen C. De Rosa, Kristen W. Cohen, D. Morris, S. Grant, A. Randhawa, Maurine D. Miner, Jenny Hendriks, F. Wegmann, Katherine Gill, F. Laher, L. Bekker, Glenda E Gray, L. Corey, M. McElrath, Troy Martin, P. Gilbert, G. Tomaras, Stephen R. Walsh, Lindsey R Baden
Year: 2025
Publication Date: 2025-04-11
Venue: PLOS Global Public Health
DOI: 10.1371/journal.pgph.0004250
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Although clade-specific and cross-clade mosaic prime–boost HIV-1 vaccine regimens were advanced to the HVTN 702 and HVTN 705 efficacy trials, neither regimen prevented HIV acquisition. The respective Phase 1/2a studies, HVTN 100 (NCT02404311) and HVTN 117/HPX2004 (NCT02788045), provided rich immunological data, including previously identified correlates of risk, for comparing immune responses elicited by these vaccine regimens over time. We analyzed antibody responses measured by binding antibody multiplex assay, and CD4+ and CD8+ T-cell responses measured by intracellular cytokine staining in per-protocol vaccinees in HVTN 100 (n=186) vs. HVTN 117/HPX2004 (n=99) after the months 6 and 12 vaccinations (months 6.5/7 and 12.5/13), and 6 months after the last vaccination (month 18). At month 12.5/13, both regimens induced similarly high IgG breadth against gp120, gp140, and V1V2 antigens, and similar IgG responses to gp70-BCaseA V1V2. IgG V1V2 responses were more durable in HVTN 117/HPX2004, with the largest difference in the gp70-BCaseA V1V2 IgG response rate at month 18 (17.8% in HVTN 100 vs 61.9% in HVTN 117/HPX2004, p<0.001). IgG3 responses to consensus Env antigens were higher and more durable in HVTN117/HPX2004; for example, IgG3 response rate to the consensus gp140 antigen was 65.9% in HVTN 117/HPX2004 vs 6.3% in HVTN 100 at month 18 (TMLE p<0.0001). At month 18, both regimens induced similar IgG3 responses to gp70-BCaseA V1V2 (3.2% in HVTN 100 vs 1.1% in HVTN 117/HPX2004). Polyfunctional CD4+ Env was significantly higher in HVTN 100, and polyfunctional CD4+ Gag was higher in HVTN 117/HPX2004. CD8+ T-cell responses were not seen in HVTN 100, while CD8+ T-cell response rates in HVTN 117/HPX2004 reached up to 42%. Despite the distinct immune responses induced by the two HIV vaccine regimens, the lack of demonstrated efficacy suggests that broader, higher magnitude, and possibly qualitatively different immune responses are needed for protection against HIV acquisition. Trial registration: ClinicalTrials.gov NCT02404311 and NCT02788045; South African National Clinical Trials Registry (DOH-27-0215-4796)

2025-04-07 — A stacked ensemble machine learning model for the prediction of pentavalent 3 vaccination dropout in East Africa

Authors: Meron Asmamaw Alemayehu, Shimels Derso Kebede, Agmasie Damtew Walle, D. Mamo, Ermias Bekele Enyew, Jibril Bashir Adem
Year: 2025
Publication Date: 2025-04-07
Venue: Frontiers Big Data
DOI: 10.3389/fdata.2025.1522578
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Introduction Vaccination is critical for reducing childhood mortality, yet completion rates for the third dose of the pentavalent vaccine (Penta 3) in East Africa remain inadequate. This study aims to predict Penta 3 vaccination dropout using a stacking ensemble machine learning model with Demographic and Health Survey (DHS) data. The objective is to identify predictors of dropout and enhance intervention strategies. Methods The study utilized seven base machine learning algorithms to create a stacked ensemble model with three meta-learners: Random Forest (RF), Generalized Linear Model (GLM), and Extreme Gradient Boosting (XGBoost). The H2O package facilitated the development of base learners and the stacking of super learners. Feature selection (FS) and comparisons were performed using the LASSO and Boruta algorithms. The selected features were one-hot encoded, and ordinal encoding was applied where appropriate. Hyperparameter optimization (HPO) and comparisons were conducted using grid search and random search. Model performance was assessed using five key metrics, including accuracy and the area under the curve (AUC). SHAP (Shapley Additive Explanations) values were used to interpret the model outputs and identify influential predictors. The experimental design was employed to present the results. Results Four experiments were conducted to evaluate feature selection and HPO methods. All stacked ensemble models outperformed individual learners, with the XGBoost meta-learner optimized with grid search and LASSO FS achieving the highest performance: 93.9% accuracy and 99.4% AUC. While RF and GLM meta-learners were also evaluated, they were outperformed by the XGBoost meta-learner. SHAP analysis revealed key features influencing Penta 3 dropout, including the place of delivery, decision-making autonomy, the mother's level of earning, and healthcare access. Home delivery increased the risk of dropout, while postnatal care by midwives and health insurance coverage lowered dropout likelihood. Conclusion and recommendation This study provides insights into the factors influencing Penta 3 vaccination dropout in East Africa. To reduce dropout rates, interventions should focus on enhancing maternal livelihood opportunities, improving healthcare access in rural areas, and promoting institutional deliveries.

2025-04-04 — Weak instrumental variables due to nonlinearities in panel data: A Super Learner Control Function estimator

Authors: Monika Avila-Marquez
Year: 2025
Publication Date: 2025-04-04
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
A triangular structural panel data model with additive separable individual-specific effects is used to model the causal effect of a covariate on an outcome variable when there are unobservable confounders with some of them time-invariant. In this setup, a linear reduced-form equation might be problematic when the conditional mean of the endogenous covariate and the instrumental variables is nonlinear. The reason is that ignoring the nonlinearity could lead to weak instruments (instruments are weakly correlated with the endogenous covariate). As a solution, we propose a triangular simultaneous equation model for panel data with additive separable individual-specific fixed effects composed of a linear structural equation with a nonlinear reduced form equation. The parameter of interest is the structural parameter of the endogenous variable. The identification of this parameter is obtained under the assumption of available exclusion restrictions and using a control function approach. Estimating the parameter of interest is done using an estimator that we call Super Learner Control Function (SLCF) estimator. The estimation procedure is composed of two main steps and sample splitting. First, we estimate the control function using a super learner . In the following step, we use the estimated control function to control for endogeneity in the structural equation. Sample splitting is done across the individual dimension. The estimator is consistent and asymptotically normal achieving a parametric rate of convergence. We perform a Monte Carlo simulation to test the performance of the estimators proposed. We conclude that the Super Learner Control Function Estimators significantly outperform Within 2SLS estimators. Finally, we show that the SLCF estimator differs from both the plug-in IV estimator and a naive plug-in 2SLS estimator.

2025-04-01 — Soil Organic Carbon (SOC) Prediction using Super Learner Algorithm Based on the Remote Sensing Variables

Authors: Yeonpyeong Jo, P. Panja, Hanseup Kim, Milind Deo
Year: 2025
Publication Date: 2025-04-01
Venue: Environmental Challenges
DOI: 10.1016/j.envc.2025.101160
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2025-04-01 — Peer-mother counseling improves HIV treatment adherence and status disclosure over time among pregnant and postpartum women in rural Uganda

Authors: Jane Kabami, L.B Balzer, Stella Kabageni, Catherine A Koss, Faith Kagoya, Jaffer Okiring, Joanita Nangendo, Emmanuel Ruhamyankaka, Peter Ssebutinde, Elizabeth Arinitwe, Michael Ayebare, John Bosco Tamu Munezeo, Valence Mfitumukiza, Anne Ruhweza Katahoire, M. Kamya, Philippa Musoke
Year: 2025
Publication Date: 2025-04-01
Venue: AJE Advances: Research in Epidemiology
DOI: 10.1093/ajeadv/uuaf002
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation, tmle

Abstract:
Adherence to antiretroviral therapy (ART) and disclosure of HIV status are critical for achieving HIV viral suppression and eliminating perinatal transmission of HIV. The ENHANCED-SPS intervention was designed to address barriers to viral suppression among pregnant and postpartum women with HIV and included standardized support and counseling though phone calls by peer-mothers. Using targeted minimum loss-based estimation (TMLE), we evaluated changes in adherence ($\le$1 dose of ART missed per month) and HIV status disclosure (to anyone and to a spouse or partner) among 505 pregnant and postpartum women with HIV who received the ENHANCED-SPS intervention in rural Uganda (2019-2021). ART adherence was 68% (95% CI, 62-74) at baseline and increased to 93% (95% CI, 81-100) after 12 months, corresponding to a 25% increase (95% CI, 9-40; P = .009). Largest improvements were among participants who were aged 15-24 years, breastfeeding, or without viral suppression at enrollment. At baseline, 80% (95% CI, 69-90) had disclosed their HIV status to anyone—increasing to 94% (95% CI, 89-99) after 12 months and corresponding to a 14% improvement (95% CI, 8-21; P = .003). Similar trends were observed for disclosure to a spouse or partner. Among pregnant and postpartum women with HIV in rural Uganda, the ENHANCED-SPS intervention was associated with meaningful improvements in ART adherence and HIV status disclosure after 1 year.

2025-04-01 — Loneliness and all cause mortality in Australian women aged 45 years and older: causal inference analysis of longitudinal data

Authors: Neta Hagani, P. Clare, D. Merom, Ben J Smith, D. Ding
Year: 2025
Publication Date: 2025-04-01
Venue: BMJ Medicine
DOI: 10.1136/bmjmed-2024-001004
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract Objective To examine the causal effects of loneliness on mortality among Australian women aged 45 years and older. Design Causal inference analysis of longitudinal data. Participants A population based sample of Australian women aged 45 years and older (n=11 412). Main outcome measures Targeted maximum likelihood estimations were used to analyse the causal relationship between loneliness and all cause mortality over 18 years. The adjusted risk of death associated with the total number of loneliness waves (loneliness persistency) and the consecutive number of loneliness waves (loneliness chronicity) was presented using risk ratios and risk differences with 99.5% confidence intervals (CIs). Results The association between the number of waves of reported loneliness and mortality risk showed a dose-dependent pattern. Compared with women who did not report loneliness in any wave, people who reported loneliness at two, four, and six waves had an incrementally higher risk of dying during the follow-up period: risk ratio 1.49 (99.5% CI 1.26 to 1.75) at two waves, 2.18 (1.79 to 2.66) at four waves, and 3.15 (2.35 to 4.23) at six waves. The risk difference showed a similar trend to the risk ratios with higher excess mortality among women who reported experiencing loneliness for six waves compared with those who did not report loneliness at all (10.86% (99.5% CI 10.58% to 11.15%)). Similar trends were found when loneliness was experienced across consecutive waves. Conclusions Loneliness seems to be causally linked to mortality risk with a dose-dependent relationship. Acknowledging loneliness as an independent health risk underscores the importance of screening for loneliness and incorporating public health interventions into healthcare practices.

2025-04-01 — Global Tariff Shocks and U.S. Agriculture: Causal Machine Learning Approaches to Competitiveness and Market Share Forecasting

Authors: Sunday Oladimeji Adegoke, Obunadike Thankgod Chiamaka
Year: 2025
Publication Date: 2025-04-01
Venue: International Journal of Research Publication and Reviews
DOI: 10.55248/gengpi.6.0425.16109
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Global tariff fluctuations present significant challenges to the competitiveness and sustainability of U.S. agricultural exports. Historically, changes in tariff structures—whether from retaliatory actions, trade renegotiations, or shifts in geopolitical alliances—have induced volatile shifts in global market shares, farm incomes, and supply chain stability. Traditional econometric models, while valuable for trend analysis, often struggle to isolate causal relationships and predict nuanced competitive dynamics under rapidly evolving policy environments. This study explores the application of causal machine learning (CML) techniques, including causal forests, synthetic control methods, and targeted maximum likelihood estimation, to forecast the impacts of global tariff shocks on U.S. agricultural competitiveness. By leveraging high-frequency trade data, tariff schedules, and market intelligence indicators, we develop predictive models that move beyond correlation, focusing instead on estimating counterfactual scenarios and heterogeneous treatment effects across commodity classes. The research covers major U.S. export commodities such as soybeans, corn, and dairy products, illustrating how CML approaches enhance foresight into market share erosion or resilience under different tariff regimes. Particular attention is given to differentiating impacts across trading partners, regional markets, and commodity types. The findings highlight the strengths of causal machine learning in enabling policymakers and agribusiness leaders to anticipate strategic vulnerabilities, optimize export strategies, and design more resilient agricultural trade frameworks. This study positions CML as an essential toolkit for navigating the increasingly complex intersections of international trade policy and agricultural

2025-03-26 — Bystander Defibrillation and Survival According to Emergency Medical Service Response Time After Out-of-Hospital Cardiac Arrest: A Nationwide Registry-Based Cohort Study

Authors: M. Hindborg, H. Yonis, F. Gnesin, K. K. Sørensen, M. Andersen, Frank Eriksson, Zehao Su, F. Folke, K. B. Ringgren, C. Hansen, H. Christensen, K. Kragholm, C. Torp-Pedersen
Year: 2025
Publication Date: 2025-03-26
Venue: Prehospital Emergency Care
DOI: 10.1080/10903127.2025.2478211
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract Objectives The impact of emergency medical services (EMS) response times when integrating bystanders’ automated external defibrillator (AED) use into established response systems remains unclear. This study aims to investigate 30-day survival probabilities for different EMS response times for bystander and non-bystander defibrillated patients and identify for which EMS response times bystander defibrillation improves 30-day survival probability. Methods Data on patients with bystander witnessed out-of-hospital-cardiac arrest (OHCAs) with initial shockable rhythm who received bystander cardiopulmonary resuscitation were retrieved from Danish Cardiac Arrest Registry for years 2016–2022. Proportions of 30-day survival were calculated for five intervals of EMS response time for patients who received bystander defibrillation and those who did not. The causal inference framework utilizing targeted maximum likelihood estimation was used to estimate 30-day survival probability for each interval of EMS response time and when comparing cases where bystander defibrillation was performed with those where it was not. This analysis was adjusted for relevant confounding factors and conducted separately for residential and public OHCAs. Results The study included 3,924 patients with OHCA. Bystander defibrillation was more frequent in public than in residential OHCAs (64.1% vs. 35.9%). Short EMS response times had higher 30-day survival probability. Bystander defibrillation resulted in higher probability of 30-day survival for EMS response times of 7–9 min (survival ratio 1.24 [95% CI: 1.03; 1.49]) in public OHCAs in the adjusted model, when compared to non-bystander defibrillated patients. Conclusions With EMS response times of 7–9 min, we detected a clear 30-day survival benefit for bystander defibrillated patients in public locations. No 30-day survival benefits were seen for other EMS response time intervals or in residential locations.

2025-03-13 — Posttraumatic Arthritis After Anterior Cruciate Ligament Injury: Machine Learning Comparison Between Surgery and Nonoperative Management

Authors: Yining Lu, Kevin Jurgensmeier, Abhinav Lamba, Linjun Yang, Mario Hevesi, Christopher L. Camp, A. Krych, Michael J. Stuart
Year: 2025
Publication Date: 2025-03-13
Venue: American Journal of Sports Medicine
DOI: 10.1177/03635465251322803
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Background: Nonoperative and operative management techniques after anterior cruciate ligament (ACL) injury are both appropriate treatment options for selected patients. However, the subsequent development of posttraumatic knee osteoarthritis (PTOA) remains an area of active study. Purpose: To compare the risk of PTOA between patients treated without surgery and with ACL reconstruction (ACLR) after primary ACL disruption using a machine learning causal inference model. Study Design: Cohort study; Level of evidence, 3. Methods: A geographic database identified patients undergoing ACLR between 1990 and 2016 with minimum 7.5-year follow-up. Variables collected include age, sex, body mass index, activity level, occupation, relevant comorbid diagnoses, radiographic findings, injury characteristics, and clinical course. Treatment effects of ACLR on the development of PTOA and progression to total knee arthroplasty (TKA) were analyzed with machine learning models (MLMs) in a causal inference estimator (targeted maximum likelihood estimation, TMLE), while controlling for confounders. Results: The study included 1194 patients with a minimum follow-up of 7.5 years, among whom 974 underwent primary reconstruction and 220 underwent nonoperative treatment. A total of 215 (22%) patients developed symptomatic PTOA in the ACLR group compared with 140 (64%) in the nonoperative treatment group (P < .001), whereas 25 (3%) patients underwent TKA in the ACLR group compared with 50 (23%) in the nonoperative treatment group (P < .001). Patients in the ACLR group had delayed TKA compared with patients in the nonoperative treatment group (193.4 vs 166.0 months, respectively; P = .02). TMLE evaluation revealed that reconstruction decreased the risk of PTOA by 11% (95% CI, 8%-13%; P < .001) compared with nonoperative treatment but did not demonstrate a significant effect on the rate of progression to TKA. Survival analysis with random forest algorithm demonstrated significant delay to the onset of PTOA as well as time to progression of TKA in patients undergoing ACLR. Additional risk factors for the development of PTOA, irrespective of treatment, included older age at injury, greater body mass index, total number of arthroscopic knee surgeries, and residual laxity at follow-up. Conclusion: MLMs in a causal inference estimator found ACLR to exert a significant treatment effect in reducing the rate of development of PTOA by 11% compared with nonoperative treatment. ACLR also delayed the onset of PTOA and progression to TKA.

2025-03-13 — Guidelines and Best Practices for the Use of Targeted Maximum Likelihood and Machine Learning When Estimating Causal Effects of Exposures on Time‐To‐Event Outcomes

Authors: D. Talbot, Awa Diop, M. Mésidor, Y. Chiu, C. Sirois, Andrew J Spieker, A. Pariente, P. Noize, Marc Simard, M. L. Luque Fernández, Michael Schomaker, Kenji Fujita, Danijela Gnjidic, Mireille E. Schnitzer
Year: 2025
Publication Date: 2025-03-13
Venue: Statistics in Medicine
DOI: 10.1002/sim.70034
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Targeted maximum likelihood estimation (TMLE) is an increasingly popular framework for the estimation of causal effects. It requires modeling both the exposure and outcome but is doubly robust in the sense that it is valid if at least one of these models is correctly specified. In addition, TMLE allows for flexible modeling of both the exposure and outcome with machine learning methods. This provides better control for measured confounders since the model specification automatically adapts to the data, instead of needing to be specified by the analyst a priori. Despite these methodological advantages, TMLE remains less popular than alternatives in part because of its less accessible theory and implementation. While some tutorials have been proposed, none address the case of a time‐to‐event outcome. This tutorial provides a detailed step‐by‐step explanation of the implementation of TMLE for estimating the effect of a point binary or multilevel exposure on a time‐to‐event outcome, modeled as counterfactual survival curves and causal hazard ratios. The tutorial also provides guidelines on how best to use TMLE in practice, including aspects related to study design, choice of covariates, controlling biases and use of machine learning. R‐code is provided to illustrate each step using simulated data ( https://github.com/detal9/SurvTMLE). To facilitate implementation, a general R function implementing TMLE with options to use machine learning is also provided. The method is illustrated in a real‐data analysis concerning the effectiveness of statins for the prevention of a first cardiovascular disease among older adults in Québec, Canada, between 2013 and 2018.

2025-03-11 — Improving the within-node estimation of survival trees while retaining interpretability

Authors: Haolin Li, Yiyang Fan, Jianwen Cai
Year: 2025
Publication Date: 2025-03-11
Venue: Journal of Applied Statistics
DOI: 10.1080/02664763.2025.2473535
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
In statistical learning for survival data, survival trees are favored for their capacity to detect complex relationships beyond parametric and semiparametric models. Despite this, their prediction accuracy is often suboptimal. In this paper, we propose a new method based on super learning to improve the within-node estimation and overall survival prediction accuracy, while preserving the interpretability of the survival tree. Simulation studies reveal the proposed method's superior finite sample performance compared to conventional approaches for within-node estimation in survival trees. Furthermore, we apply this method to analyze the North Central Cancer Treatment Group Lung Cancer Data, cardiovascular medical records from the Faisalabad Institute of Cardiology, and the integrated genomic data of ovarian carcinoma with The Cancer Genome Atlas project.

2025-03-11 — Abstract MP10: Using Target Trials and Machine Learning to Estimate Heterogeneous Treatment Effects of First-Line Antihypertensive Medications

Authors: Jingzhi Yu, Abel Kho, L. Petito, Norrina Allen
Year: 2025
Publication Date: 2025-03-11
Venue: Circulation
DOI: 10.1161/cir.151.suppl_1.mp10
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Introduction: Guidelines for hypertension (HTN) treatment make universal recommendations, ignoring prior research demonstrating differences in BP reduction between HTN medication classes in individuals with different clinical presentations. Here, we leveraged causal inference methods to emulate randomized trials to assess heterogeneity in the effectiveness of first-line HTN treatments. Methods: A hypothetical target trial was designed to assess comparative effectiveness of first-line HTN medications in subgroups defined by clinical presentation. For its emulation, patients with new HTN diagnosis (2014-2022) at Northwestern Medicine (NM) who initiated medication treatment (ACEi, ARB, CCB, Diuretics, CCB or Diuretic combination treatment [with ACEi/ARB]) were selected from the NM electronic health records. Eligibility criteria were: no prior prescription of HTN medication, 1+ outpatient visit in the year before diagnosis, and no pregnancy within a year of HTN diagnosis. The primary outcome was achieving a blood pressure of under 140/90 mm Hg after six months. The analysis involved three steps: using g-computation to estimate individual treatment effects (ITE) of antihypertensive medications, employing Causal Forest to identify key clinical variables affecting ITE, and using targeted maximum likelihood estimation with SuperLearner to estimate treatment effects within patient subgroups. Results: Our study included 22,433 eligible patients from NM. We created 16 patient subgroups by using commonly used clinical thresholds of systolic BP, low-density lipid cholesterol, age, and BMI to partition the overall patient population. Statistically significant differences in treatment effects were found in several subgroups – e.g. in subgroup 1, diuretics were found to perform significantly better than all other HTN medications ( Figure ). Conclusions: Personalized HTN treatment strategies have the potential to improve BP control rates in many patient populations.

2025-03-10 — Assessing the impact of insulin resistance trajectories on cardiovascular disease risk using longitudinal targeted maximum likelihood estimation

Authors: Yaning Feng, L. Yin, Haoran Huang, Yongheng Hu, Sitong Lin
Year: 2025
Publication Date: 2025-03-10
Venue: Cardiovascular Diabetology
DOI: 10.1186/s12933-025-02651-6
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Cardiovascular disease (CVD) is closely associated with Insulin Resistance (IR). However, there is limited research on the relationship between trajectories of IR and CVD incidence, considering both time-invariant and time-varying confounders. We employed advanced causal inference methods to evaluate the longitudinal impact of IR trajectories on CVD risk. The data for this study were extracted from a Chinese nationwide cohort, named China Health and Retirement Longitudinal Study (CHARLS). Triglyceride-glucose (TyG) index and TyG body mass index (BMI) were used as surrogate markers for IR, and their changes were recorded as exposures. Longitudinal targeted maximum likelihood estimation (LTMLE) was used to study how dynamic shifts in IR trajectories (i.e., increase, decrease, etc.) influence long-term CVD risk, adjusting for both time-invariant and time-varying confounders. A total of 3,966 participants were included in the analysis, with 2,152 (54.3%) being female. The average age at baseline was 58.28 years. Over the course of a 7-year follow-up period, 499 (12.6%) participants developed CVD. Four distinct trajectories of TyG index and TyG-BMI were identified: low stable, increasing, decreasing, and high stable. LTMLE analyses revealed individuals in the ‘high stable’ and ‘increasing’ groups had a significantly higher risk of developing CVD compared to those in the ‘low stable’ group, while the ‘decreasing’ group showed no significant differences. Specifically, when the exposure was set as TyG-BMI, the odds of CVD in the ‘high stable’ group were 1.694 (95% CI: 1.361–2.108) times higher than in the ‘low stable’ group. Similar trends were observed across other models, with ORs of 1.708 (95% CI: 1.367–2.134) in Model 2, 1.389 (1.083–1.782) in Model 3, 1.675 (1.185–2.366) in Model 4, and 1.375 (95% CI:1.07 − 1.768) in Model 5. When the exposure was changed to the TyG index, the results remained consistent, with a slightly lower magnitude of the odds ratios. High stable and increasing TyG-BMI and TyG index trajectories were associated with the risk of CVD. TyG-BMI consistently exhibited higher odds ratios (ORs) of CVD risk when comparing with TyG index. Early identification of IR trajectories could provide insights for preventing CVD later in life.

2025-03-07 — Ensemble-learning approach improves fracture prediction using genomic and phenotypic data

Authors: Qing Wu, Jongyun Jung
Year: 2025
Publication Date: 2025-03-07
Venue: Osteoporosis International
DOI: 10.1007/s00198-025-07437-w
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
This study presents an innovative ensemble machine learning model integrating genomic and clinical data to enhance the prediction of major osteoporotic fractures in older men. The Super Learner (SL) model achieved superior performance (AUC = 0.76, accuracy = 95.6%, sensitivity = 94.5%, specificity = 96.1%) compared to individual models. Ensemble machine learning improves fracture prediction accuracy, demonstrating the potential for personalized osteoporosis management. Existing fracture risk models have limitations in their accuracy and in integrating genomic data. This study developed and validated an innovative ensemble machine learning (ML) model that combines multiple algorithms and integrates clinical, lifestyle, skeletal, and genomic data to enhance prediction for major osteoporotic fractures (MOF) in older men. This study analyzed data from 5130 participants in the Osteoporotic Fractures in Men cohort Study. The model incorporated 1103 individual genome-wide significant variants and conventional risk factors of MOF. The participants were randomly divided into training (80%) and testing (20%) sets. Seven ML algorithms were combined using the SL ensemble method with tenfold cross-validation MOF prediction. Model performance was evaluated on the testing set using the area under the curve (AUC), the area under the precision-recall curve, calibration, accuracy, sensitivity, specificity, negative predictive value (NPV), positive predictive value (PPV), and reclassification metrics. SL model performances were evaluated by comparison with baseline models and subgroup analyses by race. The SL model demonstrated the best performance with an AUC of 0.76, accuracy of 95.6%, sensitivity of 94.5%, specificity of 96.1%, NPV of 95.1%, and PPV of 94.7%. Among the individual ML, gradient boosting performed optimally. The SL model outperformed baseline models, and it also achieved accuracies of 93.1% for Whites and 91.6% for Minorities, outperforming single ML in subgroup analysis. The ensemble learning approach significantly improved fracture prediction accuracy and model performance compared to individual ML. Integrating genomic and phenotypic data via the SL approach represents a promising advancement for personalized osteoporosis management.

2025-03-04 — Comparing machine learning classifier models in discriminating cognitively unimpaired older adults from three clinical cohorts in the Alzheimer’s disease spectrum: demonstration analyses in the COMPASS-ND study

Authors: Harrison Fah, Linzy Bohn, Russell Greiner, Roger A. Dixon
Year: 2025
Publication Date: 2025-03-04
Venue: Frontiers in Aging Neuroscience
DOI: 10.3389/fnagi.2025.1542514
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Background Research in aging, impairment, and Alzheimer’s disease (AD) often requires powerful computational models for discriminating between clinical cohorts and identifying early biomarkers and key risk or protective factors. Machine Learning (ML) approaches represent a diverse set of data-driven tools for performing such tasks in big or complex datasets. We present systematic demonstration analyses to compare seven frequently used ML classifier models and two eXplainable Artificial Intelligence (XAI) techniques on multiple performance metrics for a common neurodegenerative disease dataset. The aim is to identify and characterize the best performing ML and XAI algorithms for the present data. Method We accessed a Canadian Consortium on Neurodegeneration in Aging dataset featuring four well-characterized cohorts: Cognitively Unimpaired (CU), Subjective Cognitive Impairment (SCI), Mild Cognitive Impairment (MCI), and AD (N = 255). All participants contributed 102 multi-modal biomarkers and risk factors. Seven ML algorithms were compared along six performance metrics in discriminating between cohorts. Two XAI algorithms were compared using five performance and five similarity metrics. Results Although all ML models performed relatively well in the extreme-cohort comparison (CU/AD), the Super Learner (SL), Random Forest (RF) and Gradient-Boosted trees (GB) algorithms excelled in the challenging near-cohort comparisons (CU/SCI). For the XAI interpretation comparison, SHapley Additive exPlanations (SHAP) generally outperformed Local Interpretable Model agnostic Explanation (LIME) in key performance properties. Conclusion The ML results indicate that two tree-based methods (RF and GB) are reliable and effective as initial models for classification tasks involving discrete clinical aging and neurodegeneration data. In the XAI phase, SHAP performed better than LIME due to lower computational time (when applied to RF and GB) and incorporation of feature interactions, leading to more reliable results.

2025-03-01 — Time-varying confounders in association between general and central obesity and coronary heart disease: Longitudinal targeted maximum likelihood estimation on atherosclerosis risk in communities study

Authors: Hossein Mozafar Saadati, Niloufar Taherpour, S. H. Hashemi Nazari
Year: 2025
Publication Date: 2025-03-01
Venue: Global Epidemiology
DOI: 10.1016/j.gloepi.2025.100193
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2025-03-01 — Calibration of Low-Cost LoRaWAN-Based IoT Air Quality Monitors Using the Super Learner Ensemble: A Case Study for Accurate Particulate Matter Measurement

Authors: Gokul Balagopal, Lakitha O. H. Wijeratne, John Waczak, Prabuddha Hathurusinghe, Mazhar Iqbal, Daniel Kiv, Adam Aker, Seth Lee, Vardhan Agnihotri, Christopher Simmons, D. Lary
Year: 2025
Publication Date: 2025-03-01
Venue: Italian National Conference on Sensors
DOI: 10.3390/s25051614
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
This study calibrates an affordable, solar-powered LoRaWAN air quality monitoring prototype using the research-grade Palas Fidas Frog sensor. Motivated by the need for sustainable air quality monitoring in smart city initiatives, this work integrates low-cost, self-sustaining sensors with research-grade instruments, creating a cost-effective hybrid network that enhances both spatial coverage and measurement accuracy. To improve calibration precision, the study leverages the Super Learner machine learning technique, which optimally combines multiple models to achieve robust PM (Particulate Matter) monitoring in low-resource settings. Data was collected by co-locating the Palas sensor and LoRaWAN devices under various climatic conditions to ensure reliability. The LoRaWAN monitor measures PM concentrations alongside meteorological parameters such as temperature, pressure, and humidity. The collected data were calibrated against precise PM concentrations and particle count densities from the Palas sensor. Various regression models were evaluated, with the stacking-based Super Learner model outperforming traditional approaches, achieving an average test R2 value of 0.96 across all target variables, including 0.99 for PM2.5 and 0.91 for PM10.0. This study presents a novel approach by integrating Super Learner-based calibration with LoRaWAN technology, offering a scalable solution for low-cost, high-accuracy air quality monitoring. The findings demonstrate the feasibility of deploying these sensors in urban areas such as the Dallas-Fort Worth metroplex, providing a valuable tool for researchers and policymakers to address air pollution challenges effectively.

2025-02-25 — Management of high-risk acute pulmonary embolism: an emulated target trial analysis

Authors: A. Stadlbauer, Tom Verbelen, Leonhard Binzenhöfer, Tomaz Goslar, Alexander Supady, P. M. Spieth, Marko Noč, A. Verstraete, S. Hoffmann, Michael Schomaker, Julia Höpler, Marie Kraft, Esther Tautz, Daniel Hoyer, Jörn Tongers, Franz Haertel, A. El-Essawi, Mostafa Salem, R. Rangel, Carsten Hullermann, Marvin Kriz, B. Schrage, J. Moisés, M. Sabaté, F. Pappalardo, L. Crusius, N. Mangner, Christoph Adler, T. Tichelbäcker, Carsten Skurk, Christian Jung, S. Kufner, T. Graf, C. Scherer, L. V. Villegas Sierra, Hannah Billig, N. Majunke, W. Speidl, R. Zilberszac, Luis Chiscano-Camón, A. Uribarri, Jordi Riera, R. Roncon-Albuquerque, E. Terauda, A. Erglis, Guido Tavazzi, U. Zeymer, M. Knorr, Juliane Kilo, Sven Möbius-Winkler, R. H. Schwinger, Derk Frank, Oliver Borst, H. Häberle, F. De Roeck, Christiaan J M Vrints, C. Schmid, G. Nickenig, Christian Hagl, Steffen Massberg, Andreas Schäfer, D. Westermann, Sebastian Zimmer, Alain Combes, D. Camboni, Holger Thiele, E. Lüsebrink, Tom Hugo Nils Daniel Inas Kirsten Jochen Jan Benedikt Adriaenssens Lanz Gade Roden Saleh Krüger Dutzmann, Tom Adriaenssens, Hugo Lanz, Nils Gade, Daniel Roden, Inas Saleh, Kirsten Krüger, Jochen Dutzmann, J. Sackarnd, B. Beer, Jeisson Osorio, Karsten Hug, Ingo Eitel, Evija Camane, Santa Strazdina, Līga Vīduša, Silvia Klinger, Antonia Wechsler, S. Peterss, N. Kneidinger, A. Montisci, K. Toischer
Year: 2025
Publication Date: 2025-02-25
Venue: Intensive Care Medicine
DOI: 10.1007/s00134-025-07805-4
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
High-risk acute pulmonary embolism (PE) is a life-threatening condition necessitating hemodynamic stabilization and rapid restoration of pulmonary perfusion. In this context, evidence regarding the benefit of advanced circulatory support and pulmonary recanalization strategies is still limited. In this observational study, we assessed data of 1060 patients treated for high-risk acute PE with 991 being included in a target trial emulation to investigate all-cause in-hospital mortality estimates with different advanced treatment strategies. The four treatment groups consisted of patients undergoing (I) veno-arterial extracorporeal membrane oxygenation (VA-ECMO) alone (n = 126), (II) intrahospital systemic thrombolysis (SYS) (n = 643), (III) surgical thrombectomy (ST) (n = 49), and (IV) percutaneous catheter-directed treatment (PCDT) (n = 173). VA-ECMO was allowed as bridging to pulmonary recanalization in groups II, III, and IV. Marginal causal contrasts were estimated using the g-formula with logistic regression models as the primary approach. Sensitivity analyses included targeted maximum likelihood estimation (TMLE) with machine learning, inverse probability of treatment weighting (IPTW), as well as variations of estimands, handling of missing values, and a complete target trial emulation excluding the VA-ECMO alone group. In the overall target trial population, the median age was 62.0 years, and 53.3% of patients were male. The estimated probability of in-hospital mortality from the primary target trial intention-to-treat analysis for VA-ECMO alone was 57% (95% confidence interval [CI] 47%; 67%), compared to 48% (95% CI 44%; 53%) for intrahospital SYS, 34% (95%CI 18%; 50%) for ST, and 43% (95% CI 35%; 51%) for PCDT. The mortality risk ratios were largely in favor of any advanced recanalization strategy over VA-ECMO alone. The robustness of these findings was supported by all sensitivity analyses. In the crude outcome analysis, patients surviving to discharge had a high probability of favorable neurologic outcome in all treatment groups. Advanced recanalization by means of SYS, ST, and several promising catheter-directed systems may have a positive impact on short-term survival of patients presenting with high-risk PE compared to the use of VA-ECMO alone as a bridge to recovery.

2025-02-20 — Estimating Absolute Protein–Protein Binding Free Energies by a Super Learner Model

Authors: E. J. F. Chaves, João Sartori, Whendel M Santos, Carlos H B Cruz, Emmanuel N Mhrous, Manassés F Nacimento-Filho, M. Ferraz, Roberto D. Lins
Year: 2025
Publication Date: 2025-02-20
Venue: Journal of Chemical Information and Modeling
DOI: 10.1021/acs.jcim.4c01641
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Protein–protein binding is central to most biochemical processes of all living beings. Its importance underlies mechanisms ranging from cell interactions to metabolic control, but also to ex vivo biotechnology, such as the development of therapeutic monoclonal antibodies, the engineering of enzymes for industrial biocatalysis, the development of biosensors for disease detection, and the assembly of artificial protein complexes for drug screening. Therefore, predicting the strength of their association allows for understanding the molecular mechanisms and ultimately controlling them. We devised a machine learning ensemble model that uses Rosetta-based quantities to predict binding free energies of protein–protein complexes with accuracy rivaling both computationally demanding methods and currently available ML/DL tools. The method was encoded into an application Python pipeline named PBEE, which stands for Protein Binding Energy Estimator, allowing a rapid calculation of the absolute binding free energies of protein complexes from their PDB coordinates.

2025-02-19 — Pharmacokinetic interaction assessment of an HIV broadly neutralizing monoclonal antibody VRC07-523LS: a cross-protocol analysis of three phase 1 trials in people without HIV

Authors: T. D. Chawana, Stephen R. Walsh, Lynda Stranix-Chibanda, Z. Chirenje, Chenchen Yu, Lily Zhang, K. Seaton, Jack R. Heptinstall, Lu Zhang, Carmen A. Paez, Theresa Gamble, S. Karuna, P. Andrew, Brett Hanscom, M. Sobieszczyk, Srilatha Edupuganti, Cynthia L. Gay, Sharon Mannheimer, Christopher B Hurt, Kathryn E. Stephenson, L. Polakowski, Hans Spiegel, Margaret Yacovone, Stephanie Regenold, Catherine Yen, Jane A G Baumblatt, L. Gama, D. Barouch, E. Piwowar-Manning, R. Koup, Georgia D. Tomaras, O. Hyrien, Alison C. Roxby, Yunda Huang
Year: 2025
Publication Date: 2025-02-19
Venue: BMC Immunology
DOI: 10.1186/s12865-025-00687-7
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
VRC07-523LS is a safe and well-tolerated monoclonal antibody (mAb) targeting the CD4 binding site on the HIV envelope (Env) trimer. Efficacy of VRC07-523LS, in combination with mAbs targeting other HIV epitopes, will be evaluated in upcoming trials to prevent HIV acquisition in adults. However, differences in the pharmacokinetics (PK) of VRC07-523LS when administered alone vs. in combination with other mAbs have not been formally assessed. We performed a cross-protocol analysis of three clinical trials and included data from a total of 146 adults without HIV who received intravenous (n = 95) or subcutaneous (n = 51) VRC07-523LS, either alone (‘single’; n = 100) or in combination with 1 or 2 other mAbs (‘combined’; n = 46). We used an open, two-compartment population PK model to describe serum concentrations of VRC07-523LS over time, accounting for inter-individual variabilities. We compared individual-level PK parameters between the combined vs. single groups using the targeted maximum likelihood estimation method to adjust for participant characteristics. No significant differences were observed in clearance rate, inter-compartmental clearance, distribution half-life, or total VRC07-523LS exposure over time. However, for the combined group, mean central volume of distribution, peripheral volume of distribution, and elimination half-life were slightly greater, corresponding to slightly lower predicted concentrations early post-administration with high levels being maintained in both groups. These results suggest potential PK interactions between VRC07-523LS and other mAbs, but with small clinical impact in the context of HIV prevention. Our findings support coadministration of VRC07-523LS with other mAbs, and the use of the developed PK models to design future trials for HIV prevention.

2025-02-16 — Inconsistent consistency: evaluating the well-defined intervention assumption in applied epidemiological research.

Authors: Jerzy Eisenberg-Guyot, Katrina L. Kezios, Seth J. Prins, Sharon Schwartz
Year: 2025
Publication Date: 2025-02-16
Venue: International Journal of Epidemiology
DOI: 10.1093/ije/dyaf015
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
BACKGROUND According to textbook guidance, satisfying the well-defined intervention assumption is key for estimating causal effects. However, no studies have systematically evaluated how the assumption is addressed in research. Thus, we reviewed how researchers using g-methods or targeted maximum likelihood estimation (TMLE) interpreted and addressed the well-defined intervention assumption in epidemiological studies. METHODS We reviewed observational epidemiological studies that used g-methods or TMLE, were published from 2000-21 in epidemiology journals with the six highest 2020 impact factors and met additional criteria. Among other factors, reviewers assessed if authors of included studies aimed to estimate the effects of hypothetical interventions. Then, among such studies, reviewers assessed whether authors discussed key causal-inference assumptions (e.g. consistency or treatment variation irrelevance), how they interpreted their findings and if they specified well-defined interventions. RESULTS Just 20% (29/146) of studies aimed to estimate the effects of hypothetical interventions. Of such intervention-effect studies, almost none (1/29) stated 'how' the exposure would be intervened upon; among those that did not state a 'how', the 'how' mattered for consistency (i.e., for treatment variation irrelevance) in 64% of studies (18/28). Moreover, whereas 79% (23/29) of intervention-effect studies mentioned consistency, just 45% (13/29) interpreted findings as corresponding to the effects of hypothetical interventions. Finally, reviewers determined that just 38% (11/29) of intervention-effect studies had well-defined interventions. CONCLUSIONS We found substantial deviations between guidelines regarding meeting the well-defined intervention assumption and researchers' application of the guidelines, with authors of intervention-effect studies rarely critically examining the assumption's validity, let alone specifying well-defined interventions.

2025-02-12 — Development of machine learning models for predicting non-remission in early RA highlights the robust predictive importance of the RAID score-evidence from the ARCTIC study

Authors: Gaoyang Li, Shrikant S Kolan, Franco Grimolizzi, Joseph Sexton, Giulia Malachin, G. Goll, Tore K. Kvien, Nina Paulshus Sundlisæter, M. Zucknick, S. Lillegraven, E. Haavardsholm, B. S. Skålhegg
Year: 2025
Publication Date: 2025-02-12
Venue: Frontiers in Medicine
DOI: 10.3389/fmed.2025.1526708
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Introduction Achieving remission is a critical therapeutic goal in the management of rheumatoid arthritis (RA). Despite methotrexate being the cornerstone of early RA treatment, a significant proportion of patients fail to achieve remission. This study aims to predict 6-month non-remission in 222 disease-modifying anti-rheumatic drug (DMARD)-naïve RA patients initiating methotrexate monotherapy, using baseline patient characteristics from the ARCTIC trial. Methods Machine learning models were developed utilizing twenty-one baseline demographic, clinical and laboratory features to predict non-remission according to ACR/EULAR Boolean, SDAI and CDAI criteria. The model employed a super learner algorithm that combine three base algorithms of elastic net, random forest and support vector machine. The model performance was evaluated through five independent unseen tests with nested 5-fold cross-validation. The predictive power of each feature was assessed using a composite measure derived from individual algorithm estimates. Results The model demonstrated a mean AUC-ROC of 0.75-0.76, with mean sensitivity of 0.77-0.81, precision (also referred to as Positive Predictive Value) of 0.77-0.79 and specificity of 0.63-0.66 across the criteria. Predictive power analysis of each feature identified the baseline Rheumatoid Arthritis Impact of Disease (RAID) score as the strongest predictor of non-remission. A simplified model using RAID score alone demonstrated comparable performance to the full-feature model. Conclusion These findings highlight the potential utility of baseline RAID score-based model as an effective tool for early identification of patients at risk of non-remission in clinical practise.

2025-02-06 — Potential source of bias in AI models: Lactate measurement in the ICU as a template

Authors: Nebal S. Abu Hussein, P. Pradhan, F. W. Haug, D. Moukheiber, Lama Moukheiber, M. Moukheiber, Sulaiman Moukheiber, L. Weishaupt, J. Ellen, H. D'Couto, I.C. Williams, L. A. Celi, J. Matos, T. Struja
Year: 2025
Publication Date: 2025-02-06
Venue: Research Square
DOI: 10.21203/rs.3.rs-5836145/v1
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Objective: Health inequities may be driven by demographics such as sex, language proficiency, and race-ethnicity. These disparities may manifest through likelihood of testing, which in turn can bias artificial intelligence models. The goal of this study is to evaluate variation in serum lactate measurements in the Intensive Care Unit (ICU). Methods: Utilizing MIMIC-IV (2008–2019), we identified adults fulfilling sepsis-3 criteria. Exclusion criteria were ICU stay <1-day, unknown race-ethnicity, <18 years of age, and recurrent stays. Employing targeted maximum likelihood estimation analysis, we assessed the likelihood of a lactate measurement on day 1. For patients with a measurement on day 1, we evaluated the predictors of subsequent readings. Results: We studied 15,601 patients (19.5% racial-ethnic minority, 42.4% female, and 10.0% limited English proficiency). After adjusting for confounders, Black patients had a slightly higher likelihood of receiving a lactate measurement on day 1 (odds ratio 1.19, 95% confidence interval (CI) 1.06–1.34), but not the other minority groups. Subsequent frequency was similar across race-ethnicities, but women had a lower incidence rate ratio (IRR) 0.94 (95% CI 0.90–0.98). Interestingly, patients with elective admission and private insurance also had a higher frequency of repeated serum lactate measurements (IRR 1.70, 95% CI 1.61–1.81, and 1.07, 95% CI, 1.02–1.12, respectively). Conclusion: We found no disparities in the likelihood of a lactate measurement among patients with sepsis across demographics, except for a small increase for Black patients, and a reduced frequency for women. Variation in biomarker monitoring can be a source of data bias when modeling patient outcomes, and thus should be accounted for in every analysis.

2025-02-01 — Treatment heterogeneity of water, sanitation, hygiene, and nutrition interventions on child growth by environmental enteric dysfunction and pathogen status for young children in Bangladesh

Authors: Zachary Butzin-Dozier, Yunwen Ji, Jeremy Coyle, Ivana Malenica, Elizabeth T. Rogawski McQuade, J. Grembi, J. Platts-Mills, E. Houpt, Jay P Graham, Shahjahan Ali, M. Rahman, Mohammad Alauddin, S. L. Famida, S. Akther, Md. Saheen Hossen, P. Mutsuddi, A. Shoab, Mahbubur Rahman, Md Ohedul Islam, Rana Miah, M. Taniuchi, Jie Liu, Sarah T. Alauddin, Christine P. Stewart, Stephen P Luby, J. Colford, Alan Hubbard, Andrew N. Mertens, A. Lin
Year: 2025
Publication Date: 2025-02-01
Venue: PLoS Neglected Tropical Diseases
DOI: 10.1371/journal.pntd.0012881
Link: Semantic Scholar
Matched Keywords: super learner, targeted maximum likelihood estimation

Abstract:
Background Water, sanitation, hygiene (WSH), nutrition (N), and combined (N+WSH) interventions are often implemented by global health organizations, but WSH interventions may insufficiently reduce pathogen exposure, and nutrition interventions may be modified by environmental enteric dysfunction (EED), a condition of increased intestinal permeability and inflammation. This study investigated the heterogeneity of these treatments’ effects based on individual pathogen and EED biomarker status with respect to child linear growth. Methods We applied cross-validated targeted maximum likelihood estimation and super learner ensemble machine learning to assess the conditional treatment effects in subgroups defined by biomarker and pathogen status. We analyzed treatment (N+WSH, WSH, N, or control) randomly assigned in-utero, child pathogen and EED data at 14 months of age, and child HAZ at 28 months of age. We estimated the difference in mean child height for age Z-score (HAZ) under the treatment rule and the difference in stratified treatment effect (treatment effect difference) comparing children with high versus low pathogen/biomarker status while controlling for baseline covariates. Results We analyzed data from 1,522 children who had a median HAZ of −1.56. We found that fecal myeloperoxidase (N+WSH treatment effect difference 0.0007 HAZ, WSH treatment effect difference 0.1032 HAZ, N treatment effect difference 0.0037 HAZ) and Campylobacter infection (N+WSH treatment effect difference 0.0011 HAZ, WSH difference 0.0119 HAZ, N difference 0.0255 HAZ) were associated with greater effect of all interventions on anthropometry. In other words, children with high myeloperoxidase or Campylobacter infection experienced a greater impact of the interventions on anthropometry. We found that a treatment rule that assigned the N+WSH (HAZ difference 0.23, 95% CI (0.05, 0.41)) and WSH (HAZ difference 0.17, 95% CI (0.04, 0.30)) interventions based on EED biomarkers and pathogens increased predicted child growth compared to the randomly allocated intervention. Conclusions These findings indicate that EED biomarkers and pathogen status, particularly Campylobacter and myeloperoxidase (a measure of gut inflammation), may be related to the impact of N+WSH, WSH, and N interventions on child linear growth.

2025-02-01 — The Effect of a Life-Stage Based Intervention on Depression in Youth Living with HIV in Kenya and Uganda: Results from the SEARCH-Youth Trial

Authors: F. Mwangwa, Jason Johnson-Peretz, James Peng, L.B Balzer, Janice Litunya, Janet Nakigudde, D. Black, Lawrence Owino, Cecilia Akatukwasa, Anjeline Onyango, Fredrick Atwine, Titus M O Arunga, J. Ayieko, M. Kamya, Diane V. Havlir, C. S. Camlin, Ted Ruel
Year: 2025
Publication Date: 2025-02-01
Venue: Tropical Medicine and Infectious Disease
DOI: 10.3390/tropicalmed10020055
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Depression among adolescents and young adults with HIV affects both their wellbeing and clinical care outcomes. Integrated care models are needed. We hypothesized that the SEARCH-Youth intervention, a life-stage-based care model that improved viral suppression, would reduce depressive symptoms as compared to the standard of care. We conducted a mixed-methods study of youth with HIV aged 15–24 years in SEARCH-Youth, a cluster-randomized trial in rural Uganda and Kenya (NCT03848728). Depression was assessed cross-sectionally with the PHQ-9 screening tool and compared by arm using targeted minimum loss-based estimation. In-depth semi-structured interviews with young participants, family members, and providers were analyzed using a modified framework of select codes pertaining to depression. We surveyed 1,234 participants (median age 21 years, 80% female). Having any depressive symptoms was less common in the intervention arm (53%) compared to the control (73%), representing a 28% risk reduction (risk ratio: 0.72; CI: 0.59–0.89). Predictors of at least mild depression included pressure to have sex, physical threats, and recent major life events. Longitudinal qualitative research among 113 participants found that supportive counseling from providers helped patients build confidence and coping skills. Integrated models of care that address social threats, adverse life events, and social support can be used to reduce depression among adolescents and young adults with HIV.

2025-02-01 — PNL: a software to build polygenic risk scores using a super learner approach based on PairNet, a Convolutional Neural Network

Authors: Ting-Huei Chen, Chia-Jung Lee, Syue-Pu Chen, Shang-Jung Wu, C. S. J. Fann
Year: 2025
Publication Date: 2025-02-01
Venue: Bioinform.
DOI: 10.1093/bioinformatics/btaf071
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract Summary Polygenic risk scores (PRSs) hold promise for early disease diagnosis and personalized treatment, but their overall discriminative power remains limited for many diseases in the general population. As a result, numerous novel PRS modeling techniques have been developed to improve predictive performance, but determining the most effective method for a specific application remains uncertain until tested. Hence, we introduce a novel, versatile tool for building an optimized PRS model by integrating candidate models from multiple existing PRS building methods that use target population data and/or incorporating information from other populations through a trans-ethnic approach. Our tool, PNL is based on PairNet algorithm, a Convolutional Neural Network with low computation complexity through simple paring operation. In the case studies for asthma, type 2 diabetes, and vertigo, the optimal PRS model generated with PNL using only Taiwan biobank (TWB) data achieved Area Under the Curves (AUCs) that matched or improved the best results using other methods individually. Incorporating the UK Biobank data (UKBB) data further improved performance of PNL for asthma and type 2 diabetes. For vertigo, unlike the other diseases, individual method analysis showed that UKBB data alone generally produced lower AUCs compared to TWB data alone. As a result, incorporating UKBB data did not improve AUC with PNL, suggesting that increasing the number of candidate models does not necessarily result in higher AUC values, alleviating concerns about overfitting. Availability and implementation The python code for PairNet algorithm incorporated in PNL is freely available on: https://github.com/FannLab/pairnet. An archived, citable version is stored on: https://doi.org/10.5281/zenodo.14838227.

2025-02-01 — Improving Vapor Pressure Prediction Through Integration of Multiple Molecular Representations: A Super Learner Approach

Authors: Ji Hyun Nam, Seul Lee, Seongil Jo, Jaeoh Kim, Jooyeon Lee, Jahyun Koo, Byounghwak Lee, Keunhong Jeong, Donghyeon Yu
Year: 2025
Publication Date: 2025-02-01
Venue: Journal of Chemometrics
DOI: 10.1002/cem.70003
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Accurate prediction of vapor pressure is essential in chemical engineering, environmental science, and pharmaceutical development, impacting the volatility and stability of compounds. Traditional methods often fall short for complex and new molecular structures. This study introduces an advanced machine learning approach, integrating graph neural networks (GNNs), and CHEM‐BERT models to improve prediction accuracy. Utilizing the largest dataset to date, we derived comprehensive chemical descriptors and fingerprints. We evaluated 19 predictive models, including ridge regression, random forest, support vector regression, and feed‐forward neural networks, trained on diverse features like PaDEL and Morgan fingerprints, chemical descriptors, and Chem‐BERT embeddings. Central to our methodology is the super learner architecture, which combines 19 multiple models to enhance accuracy. The super learner achieved a root mean squared error (RMSE) of 0.8200, outperforming individual models and previous reports. These successful results highlight the effectiveness of integrating GNNs and Chem‐BERT for capturing detailed molecular information, setting a new benchmark for vapor pressure prediction. This study underscores the value of advanced machine learning techniques and comprehensive datasets, offering a robust tool for researchers and paving the way for future advancements in chemical property prediction.

2025-01-31 — Score-Preserving Targeted Maximum Likelihood Estimation

Authors: Noel Pimentel, Alejandro Schuler, M. V. D. Laan
Year: 2025
Publication Date: 2025-01-31
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso, targeted maximum likelihood estimation, tmle

Abstract:
Targeted maximum likelihood estimators (TMLEs) are asymptotically optimal among regular, asymptotically linear estimators. In small samples, however, we may be far from"asymptopia"and not reap the benefits of optimality. Here we propose a variant (score-preserving TMLE; SP-TMLE) that leverages an initial estimator defined as the solution of a large number of possibly data-dependent score equations. Instead of targeting only the efficient influence function in the TMLE update to knock out the plug-in bias, we also target the already-solved scores. Solving additional scores reduces the remainder term in the von-Mises expansion of our estimator because these scores may come close to spanning higher-order influence functions. The result is an estimator with better finite-sample performance. We demonstrate our approach in simulation studies leveraging the (relaxed) highly adaptive lasso (HAL) as our initial estimator. These simulations show that in small samples SP-TMLE has reduced bias relative to plug-in HAL and reduced variance relative to vanilla TMLE, blending the advantages of the two approaches. We also observe improved estimation of standard errors in small samples.

2025-01-30 — U-aggregation: Unsupervised Aggregation of Multiple Learning Algorithms

Authors: Rui Duan
Year: 2025
Publication Date: 2025-01-30
Venue: arXiv.org
DOI: 10.48550/arXiv.2501.18084
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Across various domains, the growing advocacy for open science and open-source machine learning has made an increasing number of models publicly available. These models allow practitioners to integrate them into their own contexts, reducing the need for extensive data labeling, training, and calibration. However, selecting the best model for a specific target population remains challenging due to issues like limited transferability, data heterogeneity, and the difficulty of obtaining true labels or outcomes in real-world settings. In this paper, we propose an unsupervised model aggregation method, U-aggregation, designed to integrate multiple pre-trained models for enhanced and robust performance in new populations. Unlike existing supervised model aggregation or super learner approaches, U-aggregation assumes no observed labels or outcomes in the target population. Our method addresses limitations in existing unsupervised model aggregation techniques by accommodating more realistic settings, including heteroskedasticity at both the model and individual levels, and the presence of adversarial models. Drawing on insights from random matrix theory, U-aggregation incorporates a variance stabilization step and an iterative sparse signal recovery process. These steps improve the estimation of individuals' true underlying risks in the target population and evaluate the relative performance of candidate models. We provide a theoretical investigation and systematic numerical experiments to elucidate the properties of U-aggregation. We demonstrate its potential real-world application by using U-aggregation to enhance genetic risk prediction of complex traits, leveraging publicly available models from the PGS Catalog.

2025-01-29 — An Estimator-Robust Design for Augmenting Randomized Controlled Trial with External Real-World Data

Authors: Sky Qiu, J. Tarp, Andrew Mertens, M. V. D. Laan
Year: 2025
Publication Date: 2025-01-29
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Augmenting randomized controlled trials (RCTs) with external real-world data (RWD) has the potential to improve the finite sample efficiency of treatment effect estimators. We describe using adaptive targeted maximum likelihood estimation (A-TMLE) for estimating the average treatment effect (ATE) by decomposing the ATE estimand into two components: a pooled-ATE estimand that combines data from both the RCT and external sources, and a bias estimand that captures the conditional effect of RCT enrollment on the outcome. This approach views the RCT data as the reference and corrects for inconsistencies of any kind between the RCT and the external data source. Given the growing abundance of external RWD from modern electronic health records, determining the optimal strategy to select candidate external patients for data integration remains an open yet critical problem. In this work, we begin by analyzing the robustness property of the A-TMLE estimator and then propose a matching-based sampling strategy that improves the robustness of the estimator with respect to the target estimand. Our proposed strategy is outcome-blind and involves matching based on two one-dimensional scores: the trial enrollment score and the propensity score in the external data. We demonstrate in simulations that our sampling strategy improves the coverage and shortens the widths of confidence intervals produced by A-TMLE. We illustrate our method with a case study of augmenting the DEVOTE cardiovascular safety trial by using the Optum Clinformatics claims database.

2025-01-06 — How to select predictive models for decision-making or causal inference

Authors: M. Doutreligne, G. Varoquaux
Year: 2025
Publication Date: 2025-01-06
Venue: GigaScience
DOI: 10.1093/gigascience/giaf016
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract Background We investigate which procedure selects the most trustworthy predictive model to explain the effect of an intervention and support decision-making. Methods We study a large variety of model selection procedures in practical settings: finite samples settings and without a theoretical assumption of well-specified models. Beyond standard cross-validation or internal validation procedures, we also study elaborate causal risks. These build proxies of the causal error using “nuisance” reweighting to compute it on the observed data. We evaluate whether empirically estimated nuisances, which are necessarily noisy, add noise to model selection and compare different metrics for causal model selection in an extensive empirical study based on a simulation and 3 health care datasets based on real covariates. Results Among all metrics, the mean squared error, classically used to evaluate predictive modes, is worse. Reweighting it with a propensity score does not bring much improvement in most cases. On average, the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $R\text{-risk}$\end{document}, which uses as nuisances a model of mean outcome and propensity scores, leads to the best performances. Nuisance corrections are best estimated with flexible estimators such as a super learner. Conclusions When predictive models are used to explain the effect of an intervention, they must be evaluated with different procedures than standard predictive settings, using the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $R\text{-risk}$\end{document} from causal inference.

2025-01-01 — Symmetrical "super learning": Enhancing causal learning using a bidirectional probabilistic outcome.

Authors: Santiago Castiello, Gabriella FitzGerald, G. Aisbitt, A. G. Baker, Robin A Murphy
Year: 2025
Publication Date: 2025-01-01
Venue: Journal of Experimental Psychology: Animal Learning and Cognition
DOI: 10.1037/xan0000390
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
In a learning environment, with multiple predictive cues for a single outcome, cues interfere with or enhance each other during the acquisition process (e.g., Baker et al., 1993). Previous experiments have focused on cues that signal the presence or absence of binary outcomes. This introduces a perceptual and perhaps motivational asymmetry between excitatory and inhibitory learning. Here, using a bidirectional outcome, we asked whether learning about both generative (incremental positive outcome) and preventative (incremental negative outcome) causal cues show similar enhancement effects in opposite directions. In three experiments with humans using predictive learning tasks, participants (N = 133) were exposed to probabilistic predictive cues for opposite polarity events. Generative cues caused an increase in outcome likelihood, while preventative cues decreased it. An analysis of explicit predictive ratings found evidence for symmetrical learning and enhanced learning for both generative and preventative cues. The results are discussed in relation to super learning, an effect derived from theories of competitive learning based on error correction and theories of contrasting probability estimates. (PsycInfo Database Record (c) 2025 APA, all rights reserved).

2025 — survivalSL: an R Package for Predicting Survival by a Super Learner

Authors: Camille Sabathe, Yohann Foucher
Year: 2025
Venue: The R Journal
DOI: 10.32614/rj-2024-037
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2025 — Remote Sensing and Mapping of Fine Woody Carbon With Satellite Imagery and Super Learner

Authors: Riyaaz Uddien Shaik, Mohamad Alipour, Eric Rowell, Adam C. Watts, C. Woodall, E. Taciroğlu
Year: 2025
Venue: IEEE Geoscience and Remote Sensing Letters
DOI: 10.1109/LGRS.2024.3503585
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Deadwood is a critical component of forest ecosystems, storing nutrients for plants and serving as a carbon store and emission source. Climate change influences forest ecosystem dynamics with the potential for deadwood to emit carbon more rapidly due to accelerated decay and increased wildfires and increased inputs via mass forest mortality and disturbance events. To objectively inform our understanding of wildfires and associated carbon emissions, this study estimates the carbon content of dead fine woody debris (FWD) using multimodal data, such as Landsat-8 multispectral imagery, Sentinel-1 (C-band) and PALSAR (L-band) synthetic aperture radar (SAR) imagery, and terrain features to estimate the FWD of less than 0.25 in (1 h), 0.25–1 in (10 h), and 1–3 in (100 h). This data fusion provides spectral information to assess vegetation health that correlates with deadwood, as well as penetrability from SAR, resulting in structural information and biomass sensitivity. An ensemble machine learning (ML) model was trained using measurements from the Forest Inventory and Analysis (FIA) Database. A feature importance analysis was also performed to investigate the importance of input features to the model’s performance. A super learner regression (SLR) model composed of 9 base learners, including an ElasticNet model as meta-learner, was proposed and achieved the $R^{2}$ values of 0.75, 0.72, and 0.62 to estimate 1-, 10-, and 100-h FWD, respectively. The validated model was then used to estimate deadwood carbon in the 2021 Dixie Fire region of California, demonstrating the effectiveness of our approach, emphasizing the value of multimodal data for real-time FWD carbon stock estimation.

2025-01-01 — Predicting Stunted Growth in Two Year Old Bangladeshi Children via the Super Learner

Authors: Heather L. Cook, Jennie Z. Ma, Daniel M. Keenan, Jeffrey R. Donowitz, Beth D. Kirkpatrick, Rashidul Haque, Uma Nayak, William A. Petri Jr.
Year: 2025
Publication Date: 2025-01-01
Venue: Journal of Data Science
DOI: 10.6339/25-jds1197
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Stunted growth in children is a worldwide issue which may cause long term problems for individuals stunted as early as two years of age. However, predicting stunted growth with accuracy is quite complex, but machine learning poses a distinct advantage in this regard. While several techniques are available for predictive modeling, the Super Learner stands out as an ensemble method that integrates multiple algorithms into a single predictive model with enhanced performance. In this study, the Super Learner model, comprising generalized linear model, bagged trees, random forests, conditional random forest, stochastic gradient boosting, Bayesian additive regression trees, neural networks, and model averaged neural networks, achieved high performance with high area under the receiver operating characteristic curve, Brier Score, and the minimum of precision and recall values. However, after analyzing the results from cross validation, the final model selected was the Bayesian additive regression trees. Within the final model, the height-for-age z-score at one year, income, expenditure, anti-lipopolysaccharide antibody at week 6 and at week 18, plasma retinol binding protein at week 6, plasma soluble cluster designation 14 at week 18, fecal Reg 1B at week 12, vitamin D at week 18, mother’s weight and height at enrollment, fecal calprotectin at week 12, fecal myeloperoxidase at week 12, number of days of diarrhea through the first year of life, and the number of days of exclusive breastfeeding through the first year of life emerged as the top important variables for predicting stunted growth at two years of age.

2025-01-01 — P0377 Development of Crohn’s Aid mobile application based on AI algorithms to make an early diagnosis of Crohns Disease in TB endemic regions

Authors: S. Mohta, R. Kutum, R. Pendyala, H. Dev, B. Kante, S. Kumar Vuyyuru, P. Kumar, S. Virmani, M. Kumar, S. Bahl, G. Makharia, S. Chaudhury, S. Kedia, T. Sethi, V. Ahuja
Year: 2025
Publication Date: 2025-01-01
Venue: Journal of Crohn's & Colitis
DOI: 10.1093/ecco-jcc/jjae190.0551
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Intestinal tuberculosis (ITB) and Crohn’s disease (CD) mimic each other clinically, endoscopically, and radiologically and are difficult to differentiate. This is a major barrier to make an early diagnosis of CD in TB endemic areas. We aimed to develop a high accuracy machine learning-based model which could be easily implemented in an innovative way. We retrospectively analyzed data from 1066 patients. After data cleaning, 796 patients (514 CD & 282 ITB) and data from 28 variables were included. A super learner approach using random forest and support vector machine was used for differentiating ITB from CD. Data was divided into 80 percent training and 20 percent testing data sets followed by 10-fold cross validation. The optimal cut-off for the diagnosis was obtained using the Youden index-measure to optimize the balance between sensitivity and specificity and the model was evaluated at multiple thresholds for clinical utility. The best performing model was incorporated into a mobile phone-based application. Prospective validation on 37 patients was carried out with similar accuracy. The random forest model achieved a sensitivity, specificity and accuracy of 0.92, 0.83, 0.86 respectively and performed better than the support vector machine model trained with linear and radial basis functions. The random forest model was found to have the best AUROC with a cutoff of 0.4, predicting the diagnosis of CD with a sensitivity of 93%, specificity of 83%, and accuracy of 86%, positive predictive value of 76%, and negative predictive value of 95%. The random forest model was used for creation of the application. The app is being made available on smartphones free of cost for use by any physician. Our model differentiated between ITB and CD with high accuracy and has the potential to makean early diagnosis of CD. The free to use mobile application would make implementation of thisalgorithm much easier, allowing for widespread use in clinical practice and helping make moreinformed decisions. More data from multiple centers and different geographical locationswould aid in further improving the model performance. Figure 1: Overall model training approach Figure 2: Model performance and selection of appropriate cut-off

2025 — Modelo predictivo de las metas de aproximación al dominio: un análisis con inteligencia artificial

Authors: Francisco Quiñonez Tapia, María de Lourdes Vargas Garduño
Year: 2025
Venue: Revista Electrónica de Investigación Educativa
DOI: 10.24320/redie.2025.27.of.6555
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
El objetivo del estudio fue predecir con Inteligencia Artificial (AI) las Metas de Aproximación al Dominio en estudiantes de secundaria a partir de las variables recuperadas por PISA en el 2018. La muestra se tomó de la base de datos de PISA de 2018; y estuvo integrada por 339560 participantes de 60 países. Metodológicamente, se utilizó el aprendizaje automático con el algoritmo Super Learner con 14 algoritmos candidatos; asimismo, se utilizó el aprendizaje profundo con una red neuronal de 107 parámetros. Los análisis arrojaron que el Apoyo emocional de los padres percibido por el estudiante, la Competitividad, Sentido de Vida, Autoeficacia y Conocimiento de la comunicación intercultural fueron predictores de las Metas de Aproximación al Dominio en los estudiantes de secundaria. La investigación contribuye al avance del estudio de los factores que influyen en las metas de aproximación del dominio en estudiantes de secundaria, con modelos de análisis que permiten avanzar en el desarrollo de la ciencia de la educación predictiva con IA.

2025-01-01 — Modeling the determinants of attrition in a two-stage epilepsy prevalence survey in Nairobi using machine learning

Authors: Daniel M. Mwanga, Isaac C. Kipchirchir, G. Muhua, Charles R. Newton, Damazo T. Kadengye, Abankwah Albert Arjune Bruno Charles R. Cynthia Dan Daniel Junior Akpalu Sen Mmbando Newton Sottie Bhwana Mwa, Abankwah Junior, Albert Akpalu, Arjune Sen, Bruno P. Mmbando, Charles R. Newton, C. Sottie, D. Bhwana, Daniel Mtai Mwanga, Damazo T. Kadengye, Daniel Nana Yaw, David McDaid, Dorcas Muli, E. Darkwa, F. M. Wekesah, G. Asiki, Gergana Manolova, Guillaume Pages, Helen Cross, Henrika Kimambo, Isolide S. Massawe, J. Sander, Mary A. Bitta, Mercy Atieno, Neerja Chowdhary, Patrick Adjei, Peter O. Otieno, Ryan G Wagner, Richard Walker, S. Asiamah, S. Iddi, Simone Grassi, S. Mahone, Sonia Vallentin, Stella Waruingi, S. Kariuki, Tarun Dua, Thomas Kwasa, Timothy Denison, T. Godi, Vivian P. Mushi, W. Matuja
Year: 2025
Publication Date: 2025-01-01
Venue: Global Epidemiology
DOI: 10.1016/j.gloepi.2025.100183
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Background Attrition is a challenge in parameter estimation in both longitudinal and multi-stage cross-sectional studies. Here, we examine utility of machine learning to predict attrition and identify associated factors in a two-stage population-based epilepsy prevalence study in Nairobi. Methods All individuals in the Nairobi Urban Health and Demographic Surveillance System (NUHDSS) (Korogocho and Viwandani) were screened for epilepsy in two stages. Attrition was defined as probable epilepsy cases identified at stage-I but who did not attend stage-II (neurologist assessment). Categorical variables were one-hot encoded, class imbalance was addressed using synthetic minority over-sampling technique (SMOTE) and numeric variables were scaled and centered. The dataset was split into training and testing sets (7:3 ratio), and seven machine learning models, including the ensemble Super Learner, were trained. Hyperparameters were tuned using 10-fold cross-validation, and model performance evaluated using metrics like Area under the curve (AUC), accuracy, Brier score and F1 score over 500 bootstrap samples of the test data. Results Random forest (AUC = 0.98, accuracy = 0.95, Brier score = 0.06, and F1 = 0.94), extreme gradient boost (XGB) (AUC = 0.96, accuracy = 0.91, Brier score = 0.08, F1 = 0.90) and support vector machine (SVM) (AUC = 0.93, accuracy = 0.93, Brier score = 0.07, F1 = 0.92) were the best performing models (base learners). Ensemble Super Learner had similarly high performance. Important predictors of attrition included proximity to industrial areas, male gender, employment, education, smaller households, and a history of complex partial seizures. Conclusion These findings can aid researchers plan targeted mobilization for scheduled clinical appointments to improve follow-up rates. These findings will inform development of a web-based algorithm to predict attrition risk and aid in targeted follow-up efforts in similar studies.

2025-01-01 — Highly adaptive Lasso for estimation of heterogeneous treatment effects and treatment recommendation

Authors: Sohail Nizam, Allison Codi, Elizabeth Rogawski McQuade, D. Benkeser
Year: 2025
Publication Date: 2025-01-01
Venue: Journal of Causal Inference
DOI: 10.1515/jci-2023-0085
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso

Abstract:
Abstract The estimation of conditional average treatment effects (CATEs) is an important problem in many applications. Many machine learning-based frameworks for such estimation have been proposed, including meta-learning, causal trees, and causal forests. However, few of these methods are interpretable, while those that do emphasize interpretability often suffer in terms of performance. Here, we propose several methods that build on existing meta-learning algorithms to produce CATE estimates that can be represented as trees. We also describe new methods for the estimation of optimal treatment policies (OTPs), an area where interpretable, auditable treatment decision rules are often desirable. We introduce this method for settings with an arbitrary number of treatment arms. We provide regret rates for the proposed methods and show that they outperform popular methods, both interpretable and not. Finally, we demonstrate the use of our method on both simulated and real data from the Antibiotics for Children with severe Diarrhea trial to create OTPs for antibiotic treatment.

2025 — Designing Optimal Dynamic Treatment Regimes Using TMLE for Personalized Math Course-Taking Plans

Authors: Chenguang Pan
Year: 2025
Venue: Proceedings of the 2025 AERA Annual Meeting
DOI: 10.3102/2190733
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract unavailable from Semantic Scholar.

2024 (135 papers)
2024-12-31 — Semiparametric efficient estimation of small genetic effects in large-scale population cohorts

Authors: Olivier Labayle, Breeshey Roskams-Hieter, Joshua Slaughter, Kelsey Tetley-Campbell, M. J. van der Laan, C. P. Ponting, S. Beentjes, A. Khamseh
Year: 2024
Publication Date: 2024-12-31
Venue: Biostatistics
DOI: 10.1093/biostatistics/kxaf030
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Summary Population genetics seeks to quantify DNA variant associations with traits or diseases, as well as interactions among variants and with environmental factors. Computing millions of estimates in large cohorts in which small effect sizes and tight confidence intervals are expected, necessitates minimizing model-misspecification bias to increase power and control false discoveries. We present TarGene, a unified statistical workflow for the semi-parametric efficient and double robust estimation of genetic effects including \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $ k $\end{document}-point interactions among categorical variables in the presence of confounding and weak population dependence. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $ k $\end{document}-point interactions, or Average Interaction Effects (AIEs), are a direct generalization of the usual average treatment effect (ATE). We estimate genetic effects with cross-validated and/or weighted versions of Targeted Minimum Loss-based Estimators (TMLE) and One-Step Estimators (OSE). The effect of dependence among data units on variance estimates is corrected by using sieve plateau variance estimators based on genetic relatedness across the units. We present extensive realistic simulations to demonstrate power, coverage, and control of type I error. Our motivating application is the targeted estimation of genetic effects on trait, including two-point and higher-order gene-gene and gene-environment interactions, in large-scale genomic databases such as UK Biobank and All of Us. All cross-validated and/or weighted TMLE and OSE for the AIE \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $ k $\end{document}-point interaction, as well as ATEs, conditional ATEs and functions thereof, are implemented in the general purpose Julia package TMLE.jl. For high-throughput applications in population genomics, we provide the open-source Nextflow pipeline and software TarGene which integrates seamlessly with modern high-performance and cloud computing platforms.

2024-12-30 — O POTENCIAL PREDITOR DA CONSCIÊNCIA FONOLÓGICA EM SEUS COMPONENTES SILÁBICO E FONÊMICO NA APRENDIZAGEM INICIAL DA LEITURA E DA ESCRITA

Authors: Luise Rebouças Leite Leal dos Santos
Year: 2024
Publication Date: 2024-12-30
Venue: Repositório Digital de Teses e Dissertações do PPGLin-UESB
DOI: 10.54221/rdtdppglinuesb.2024.v12i1.274
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Há um substancial número de pesquisas que evidenciam uma correlação importante entre a Consciência Fonológica e a aprendizagem inicial da leitura e da escrita, inclusive sobre seu potencial preditivo para antever possíveis casos de atraso escolar. No entanto, são escassos os estudos que investigam os componentes constitutivos da Consciência Fonológica, particularmente a consciência silábica e a fonêmica. Esta dissertação apresenta os resultados de um estudo longitudinal que visa responder a essa lacuna científica. Participaram deste estudo cinco turmas de escolares em processo de alfabetização, com faixa etária entre seis e sete anos de idade. Esses escolares estudavam em escolas públicas municipais situadas no Município de Vitória da Conquista. As variáveis independentes, ou seja, a consciência silábica e a fonêmica, foram avaliadas no início do processo de alfabetização por meio do teste CONFIAS. O desempenho em leitura e escrita foi monitorado em cinco ocasiões, com intervalos de aproximadamente dois a três meses entre cada edição. Essa avaliação foi conduzida por meio do teste TMLE (Teste de Monitoramento da Leitura e da Escrita). Esse teste permite acompanhar o desenvolvimento das habilidades de leitura e de escrita, fornecendo informações relevantes sobre o desempenho dos indivíduos nesses domínios. Os dados foram discutidos à luz do Paradigma Dinamicista, também conhecido como Sistemas Adaptativos Complexos (SAC) e por meio da revisão sistemática da literatura documentada. A análise dos dados mostrou que o potencial preditivo dos constituintes da Consciência Fonológica é equivalente. Sob um valor de p<0,001, a variação do coeficiente de correlação entre desempenho em leitura e escrita e consciência silábica foi entre R 0.48 a R 0.56 entre leitura e escrita e consciência fonêmica foi de 0.42 a 0.58. Os resultados permitem concluir que tanto a consciência silábica quanto a consciência fonêmica são preditores equivalentes e predizem moderadamente o desempenho em leitura e escrita.

2024-12-23 — Advancing Web Security: Machine Learning-Based Attack Detection with Optimized Features

Authors: Sainath Patil, Rajesh Bansode
Year: 2024
Publication Date: 2024-12-23
Venue: Panamerican Mathematical Journal
DOI: 10.52783/pmj.v35.i2s.2938
Link: Semantic Scholar
Matched Keywords: super learner, super learning

Abstract:
Web applications remain highly susceptible to cyberattacks despite efforts to mitigate these threats through initiatives like the OWASP Top 10. This research addresses the critical challenge of improving web attack detection by integrating advanced feature extraction techniques and machine learning (ML) models. A novel approach was developed, consisting of three main contributions: the design of a testbed attack network to simulate real-world scenarios, the application of a wrapper-based feature extraction method combining mutual information (MI) and genetic algorithms (GA) to select pertinent traffic features, and the implementation of a super learning ensemble model for robust attack detection. The feature selection method extracted features included critical traffic attributes, which significantly enhanced the model's detection capabilities. The proposed ensemble-based super learner model achieved a remarkable accuracy of 99.12% with a reduced prediction time of 125 milliseconds. Compared to conventional ML models, the proposed model demonstrated a 26% improvement in detection accuracy and a 99% reduction in prediction time, making it highly efficient for real-time web attack detection.

2024-12-21 — Prediction of surface roughness of tempered steel AISI 1060 under effective cooling using super learner machine learning

Authors: Firi Ziyad, Habtamu Alemayehu, Desalegn Wogaso, Firomsa Dadi
Year: 2024
Publication Date: 2024-12-21
Venue: The International Journal of Advanced Manufacturing Technology
DOI: 10.1007/s00170-024-14952-3
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2024-12-19 — Assessing Treatment Effects in Observational Data With Missing Confounders: A Comparative Study of Practical Doubly-Robust and Traditional Missing Data Methods.

Authors: Brian D. Williamson, Chloe Krakauer, Eric Johnson, Susan Gruber, Bryan E. Shepherd, M. J. van der Laan, T. Lumley, Hana Lee, J. J. Hernández-Muñoz, Fengyu Zhao, Sarah K Dutcher, R. Desai, Gregory E. Simon, S. Shortreed, Jennifer C. Nelson, Pamela A. Shaw
Year: 2024
Publication Date: 2024-12-19
Venue: Statistics in Medicine
DOI: 10.1002/sim.70366
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
In pharmacoepidemiology, safety and effectiveness are frequently evaluated using readily available administrative and electronic health records data. In these settings, detailed confounder data are often not available in all data sources and therefore missing on a subset of individuals. Multiple imputation (MI) and inverse-probability weighting (IPW) are go-to analytical methods to handle missing data and are dominant in the biomedical literature. Doubly-robust methods, which are consistent under fewer assumptions, can be more efficient with respect to mean-squared error. We discuss two practical-to-implement doubly-robust estimators, generalized raking and inverse probability-weighted targeted maximum likelihood estimation (TMLE), which are both currently under-utilized in biomedical studies. We compare their performance to IPW and MI in a detailed numerical study for a variety of synthetic data-generating and missingness scenarios, including scenarios with rare outcomes and a high missingness proportion. Further, we consider plasmode simulation studies that emulate the complex data structure of a large electronic health records cohort in order to compare anti-depressant therapies in a rare-outcome setting where a key confounder is prone to more than 50% missingness. We provide guidance on selecting a missing data analysis approach, based on which methods excelled with respect to the bias-variance trade-off across the different scenarios studied.

2024-12-13 — What if we had built a prediction model with a survival super learner instead of a Cox model 10 years ago?

Authors: Arthur Chatton, 'Emilie Pilote, Kevin Assob Feugo, H'eloise Cardinal, Robert W Platt, M. Schnitzer
Year: 2024
Publication Date: 2024-12-13
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Objective: This study sought to compare the drop in predictive performance over time according to the modeling approach (regression versus machine learning) used to build a kidney transplant failure prediction model with a time-to-event outcome. Study Design and Setting: The Kidney Transplant Failure Score (KTFS) was used as a benchmark. We reused the data from which it was developed (DIVAT cohort, n=2,169) to build another prediction algorithm using a survival super learner combining (semi-)parametric and non-parametric methods. Performance in DIVAT was estimated for the two prediction models using internal validation. Then, the drop in predictive performance was evaluated in the same geographical population approximately ten years later (EKiTE cohort, n=2,329). Results: In DIVAT, the super learner achieved better discrimination than the KTFS, with a tAUROC of 0.83 (0.79-0.87) compared to 0.76 (0.70-0.82). While the discrimination remained stable for the KTFS, it was not the case for the super learner, with a drop to 0.80 (0.76-0.83). Regarding calibration, the survival SL overestimated graft survival at development, while the KTFS underestimated graft survival ten years later. Brier score values were similar regardless of the approach and the timing. Conclusion: The more flexible SL provided superior discrimination on the population used to fit it compared to a Cox model and similar discrimination when applied to a future dataset of the same population. Both methods are subject to calibration drift over time. However, weak calibration on the population used to develop the prediction model was correct only for the Cox model, and recalibration should be considered in the future to correct the calibration drift.

2024-12-12 — Prediction of Full-Load Electrical Power Output of Combined Cycle Power Plant Using a Super Learner Ensemble

Authors: Yu-jeong Song, Ji-su Park, M. Suh, Chansoo Kim
Year: 2024
Publication Date: 2024-12-12
Venue: Applied Sciences
DOI: 10.3390/app142411638
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Combined Cycle Power Plants (CCPPs) generate electrical power through gas turbines and use the exhaust heat from those turbines to power steam turbines, resulting in 50% more power output compared to traditional simple cycle power plants. Predicting the full-load electrical power output (PE) of a CCPP is crucial for efficient operation and sustainable development. Previous studies have used machine learning models, such as the Bagging and Boosting models to predict PE. In this study, we propose employing Super Learner (SL), an ensemble machine learning algorithm, to enhance the accuracy and robustness of predictions. SL utilizes cross-validation to estimate the performance of diverse machine learning models and generates an optimal weighted average based on their respective predictions. It may provide information on the relative contributions of each base learner to the overall prediction skill. For constructing the SL, we consider six individual and ensemble machine learning models as base learners and assess their performances compared to the SL. The dataset used in this study was collected over six years from an operational CCPP. It contains one output variable and four input variables: ambient temperature, atmospheric pressure, relative humidity, and vacuum. The results show that the Boosting algorithms significantly influence the performance of the SL in comparison to the other base learners. The SL outperforms the six individual and ensemble machine learning models used as base learners. It indicates that the SL improves the generalization performance of predictions by combining the predictions of various machine learning models.

2024-12-10 — Towards robust causal inference in epidemiologic research: employing double cross-fit TMLE in right heart catheterization data

Authors: Momenul Haque Mondol, Mohammad Ehsanul Karim
Year: 2024
Publication Date: 2024-12-10
Venue: American Journal of Epidemiology
DOI: 10.1093/aje/kwae447
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Abstract Within epidemiologic research, estimating treatment effects from observational data presents notable challenges. Targeted maximum likelihood estimation (TMLE) emerges as a robust method, addressing these challenges by accurately modeling treatment effects. This approach uniquely combines the precision of correctly specified models with the versatility of data-adaptive, flexible machine learning algorithms. Despite its effectiveness, TMLE’s integration of complex algorithms can introduce bias and undercoverage. This issue is addressed through the double cross-fit TMLE (DC-TMLE) approach, enhancing accuracy and reducing biases inherent in observational studies. However, DC-TMLE’s potential remains underexplored in epidemiologic research, primarily due to the lack of comprehensive methodologic guidance and the complexity of its computational implementation. Recognizing this gap, our article contributes a detailed, reproducible guide for implementing DC-TMLE in R, aimed specifically at epidemiologic applications. We demonstrate the utility of this method using an openly available clinical data set, underscoring its relevance and adaptability for robust epidemiologic analysis. This guide aims to facilitate broader adoption of DC-TMLE in epidemiologic studies, promoting more accurate and reliable treatment effect estimations in observational research.

2024-12-06 — Estimating the treatment effect over time under general interference through deep learner integrated TMLE

Authors: Suhan Guo, Shen Furao, Ni Li
Year: 2024
Publication Date: 2024-12-06
Venue: arXiv.org
DOI: 10.48550/arXiv.2412.04799
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Understanding the effects of quarantine policies in populations with underlying social networks is crucial for public health, yet most causal inference methods fail here due to their assumption of independent individuals. We introduce DeepNetTMLE, a deep-learning-enhanced Targeted Maximum Likelihood Estimation (TMLE) method designed to estimate time-sensitive treatment effects in observational data. DeepNetTMLE mitigates bias from time-varying confounders under general interference by incorporating a temporal module and domain adversarial training to build intervention-invariant representations. This process removes associations between current treatments and historical variables, while the targeting step maintains the bias-variance trade-off, enhancing the reliability of counterfactual predictions. Using simulations of a ``Susceptible-Infected-Recovered'' model with varied quarantine coverages, we show that DeepNetTMLE achieves lower bias and more precise confidence intervals in counterfactual estimates, enabling optimal quarantine recommendations within budget constraints, surpassing state-of-the-art methods.

2024-12-02 — Correction: The Targeted Maximum Likelihood estimation to estimate the causal effects of the previous tuberculosis treatment in Multidrug-resistant tuberculosis in Sudan

Authors: A. Elduma, K. Holakouie-Naieni, Amir Almasi-Hashiani, A. Foroushani, Hamdan Mustafa Hamdan Ali, M. A. Adam, Asma Elsony, M. Mansournia
Year: 2024
Publication Date: 2024-12-02
Venue: PLoS ONE
DOI: 10.1371/journal.pone.0314954
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
[This corrects the article DOI: 10.1371/journal.pone.0279976.].

2024-12-01 — Vasodilator drugs and heart-related outcomes in systemic sclerosis: an exploratory analysis

Authors: A. Guédon, Fabrice Carrat, Luc Mouthon, D. Launay, B. Chaigne, G. Pugnet, Jean-Christophe Lega, A. Hot, Vincent Cottin, C. Agard, Y. Allanore, A. Fauchais, Alain Lescoat, Robin Dhote, Thomas Papo, E. Chatelus, B. Bonnotte, Jean-Emmanuel Kahn, Elisabeth Diot, A. Aouba, N. Magy-Bertrand, V. Queyrel, Alain Le Quellec, Pierre Kieffer, Zahir Amoura, B. Granel, Jean-Baptiste Gaultier, M. Balquet, Denis Wahl, O. Lidove, O. Espitia, Ariel Cohen, O. Fain, E. Hachulla, A. Mékinian, Sébastien Rivière
Year: 2024
Publication Date: 2024-12-01
Venue: RMD Open
DOI: 10.1136/rmdopen-2024-004918
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Background and aims Systemic sclerosis (SSc) is an autoimmune connective disease characterised by excessive extracellular matrix deposition and widespread skin and internal organ fibrosis including various cardiac manifestations. Heart involvement is one of the leading causes of death among patients with SSc. In this study, we aimed to assess the effect of various vasodilator treatments. Methods We used data from a national multicentric prospective study using the French SSc national database. We estimated the average treatment effect (ATE) of sildenafil, bosentan, angiotensin-converting enzyme (ACE) inhibitors and iloprost on diastolic dysfunction, altered ejection fraction <50% and pulmonary arterial hypertension (PAH) using a causal method, namely the longitudinal targeted minimum loss-based estimation, to adjust for confounding and informative censoring. Results We included 1048 patients with available data regarding treatment. Regarding sildenafil analyses, the ATE on diastolic dysfunction at 3 years was −2.83% (95% CI −4.06; −1.60, p<0.00001), and the estimated ATE on altered ejection fraction <50% was −0.88% (95% CI −1.70; −0.05, p=0.037). We did not find a significative effect on PAH. Regarding bosentan, ACE inhibitors and iloprost, none of them neither showed a significant effect on diastolic dysfunction, altered ejection fraction <50% or PAH. Conclusions Using causal methods, our study is the first and largest suggesting that sildenafil might have benefits among SSc patients regarding diastolic dysfunction and altered ejection fraction occurrence. However, further studies assessing the effect of vasodilators on heart-related outcome among SSc patients are needed to confirm those exploratory results.

2024-12-01 — PT41 Benefits of Inhaled Corticosteroids (ICS) in COPD Maintenance Combinations: Real-World Evidence Using Longitudinal Targeted Maximum Likelihood Estimation

Authors: J.P. Ekwaru, S. McMullen, T. Cowling, M. Bhutani, M. van der Laan
Year: 2024
Publication Date: 2024-12-01
Venue: Value in Health
DOI: 10.1016/j.jval.2024.10.3840
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2024-12-01 — Machine Learning Models Decoding the Association Between Urinary Stone Diseases and Metabolic Urinary Profiles

Authors: Lin Ma, Yi Qiao, Runqiu Wang, Hualin Chen, Guanghua Liu, He Xiao, Ran Dai
Year: 2024
Publication Date: 2024-12-01
Venue: Metabolites
DOI: 10.3390/metabo14120674
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Background: Employing advanced machine learning models, we aim to identify biomarkers for urolithiasis from 24-h metabolic urinary abnormalities and study their associations with urinary stone diseases. Methods: We retrospectively recruited 468 patients at Peking Union Medical College Hospital who were diagnosed with urinary stone disease, including renal, ureteral, and multiple location stones, and had undergone a 24-h urine metabolic evaluation. We applied machine learning methods to identify biomarkers of urolithiasis from the urinary metabolite profiles. In total, 148 (34.02%) patients were with kidney stones, 34 (7.82%) with ureter stones, and 163 (34.83%) with multiple location stones, all of whom had detailed urinary metabolite data. Our analyses revealed that the Random Forest algorithm exhibited the highest predictive accuracy, with AUC values of 0.809 for kidney stones, 0.99 for ureter stones, and 0.775 for multiple location stones. The Super Learner Ensemble Method also demonstrated high predictive performance with slightly lower AUC values compared to Random Forest. Further analysis using multivariate logistic regression identified significant features for each stone type based on the Random Forest method. Results: We found that 24-h urinary magnesium was positively associated with both kidney stones and multiple location stones (OR = 1.195 [1.06–1.3525] and 1.3258 [1.1814–1.4949]) due to its high correlation with urinary phosphorus, while 24-h urinary creatinine was a protective factor for kidney stones and ureter stones, with ORs of 0.9533 [0.9117–0.996] and 0.8572 [0.8182–0.8959]. eGFR was a risk factor for ureter stones and multiple location stones, with ORs of 1.0145 [1.0084–1.0209] and 1.0148 [1.0077–1.0223]. Conclusion: Machine learning techniques show promise in revealing the links between urological stone disease and 24-h urinary metabolic data. Enhancing the prediction accuracy of these models leads to improved dietary or pharmacological prevention strategies.

2024-12-01 — HYPOTHETICAL INTERVENTIONS ON LONELINESS AND MEMORY FUNCTION AMONG US MIDDLE-AGED AND OLDER ADULTS

Authors: Ryo Ikesu, L. P. Rojas‐Saunero, Eleanor Hayes‐Larson, E. Mayeda
Year: 2024
Publication Date: 2024-12-01
Venue: Innovation in aging
DOI: 10.1093/geroni/igae098.0528
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract Evidence has shown that middle-aged and older adults who experience persistent loneliness have lower memory function compared to those with transient loneliness or those who never feel lonely, but it remains unclear whether sustained interventions on loneliness, as opposed to a one-time intervention, are more effective in preserving late-life memory function. Researchers are increasingly using hypothetical intervention approaches to estimate the impact of population-level interventions with observational data. Using the nationally-representative Health and Retirement Study in 2006–2016 (n=16,977; median baseline age, 68), we estimated the population-level effects of two hypothetical interventions of loneliness on memory scores 8 years after baseline: [A] preventing loneliness at baseline and 4 years after baseline (sustained intervention at two time points) and [B] preventing loneliness only at baseline (one-time intervention). We used targeted maximum likelihood estimation to account for censoring and both baseline and time-varying confounding (health-related behaviors, working status, household wealth, social relationships, and health conditions). In this approach, we accounted for the possibility that people with lower memory function feel lonely more often than those with higher memory function. Compared to the natural course (i.e., no intervention), both sustained and one-time interventions were associated with slightly higher memory scores 8 years after baseline (0.026 standardized units [95% CI: 0.003–0.048] for the sustained intervention; 0.021 standardized units [95% CI: 0.005–0.037] for the one-time intervention). Our findings suggest that both sustained and one-time interventions on loneliness may preserve memory function at the population level.

2024-12-01 — How are energy R&D investments beneficial in ensuring energy transition: Evidence from leading R&D investing countries by novel super learner algorithm

Authors: U. Pata, M. Kartal, Serpil Kılıç Depren
Year: 2024
Publication Date: 2024-12-01
Venue: Sustainable Energy Technologies and Assessments
DOI: 10.1016/j.seta.2024.104084
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2024-12-01 — EE727 Determining Survival Impact and Cost-Effectiveness of Multi-Gene Panel Sequencing in Metastatic Colorectal Cancer With Super Learning Approaches

Authors: E. Krebs, D. Weymann, D. Regier
Year: 2024
Publication Date: 2024-12-01
Venue: Value in Health
DOI: 10.1016/j.jval.2024.10.1007
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Abstract unavailable from Semantic Scholar.

2024-12-01 — Association between the age-adjusted Charlson Comorbidity Index and complications after kidney transplantation: a retrospective observational cohort study

Authors: Qin Huang, Tongsen Luo, Jirong Yang, Yaxin Lu, Shaoli Zhou, Ziqing Hei, Chaojin Chen
Year: 2024
Publication Date: 2024-12-01
Venue: BMC Nephrology
DOI: 10.1186/s12882-024-03888-1
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Complications following kidney transplantation elevate the risks of readmission and mortality. The aim of this study was to assess the association between the age-adjusted Charlson Comorbidity Index (ACCI) and postoperative complications among kidney transplant (KT) recipients. Between January 2015 and March 2021, a study involving 886 kidney transplant recipients at the Third Affiliated Hospital of Sun Yat-sen University was conducted. Postoperative complications were defined by the Clavien-Dindo Classification of Surgical Complications. Target Maximum Likelihood Estimation (TMLE) was employed to assess the association between ACCI and postoperative complications. The odds ratio (OR) was computed to determine the relationship between ACCI and postoperative complications. Subsequent interaction and stratified analyses were performed to assess the robustness of the findings. Out of 859 KT participants ultimately included in the study, 30.7% were documented to have encountered postoperative complications. Participants with an ACCI value exceeding 3 exhibited a notably increased risk of postoperative complications following multivariable adjustment [aOR = 1.64, 95% CI [1.21,2.21], p = 0.001]. Congestive heart failure (OR = 16.18, 95% CI [1.98–132.17], p < 0.001), peripheral vascular disease (OR = 2.32, 95% CI [1.48–3.78], p < 0.001), and chronic obstructive pulmonary disease (OR = 6.05, 95% CI [2.95–12.39], p < 0.001) emerged as the top three preoperative comorbidities significantly linked to postoperative complications in ACCI. An ACCI value exceeding 3 preoperatively constituted a risk factor for postoperative complications among KT patients.

2024-12-01 — Accelerated intelligent prediction and analysis of mechanical properties of magnesium alloys based on scaled super learner machine-learning algorithms

Authors: Atwakyire Moses, Ying Gui, Buzhuo Chen, Marembo Micheal, Ding Chen
Year: 2024
Publication Date: 2024-12-01
Venue: Mechanics of materials (Print)
DOI: 10.1016/j.mechmat.2024.105168
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2024-11-30 — Optimized Web Server Attack Detection: A Super Learner Ensemble Model Approach

Authors: Sainath Patil, Rajesh Bansode
Year: 2024
Publication Date: 2024-11-30
Venue: Advances in Nonlinear Variational Inequalities
DOI: 10.52783/anvi.v28.2564
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Web applications are essential for many organizations, yet they are vulnerable to various security threats, such as injection attacks and inadequate authentication mechanisms. To address these risks, this study proposes a super learner ensemble learning model that combines multiple machine learning (ML) algorithms to improve web server attack detection. Leveraging the unique strengths of each base ML model, the super learner approach enhances predictive accuracy by using a meta-model trained on out-of-fold predictions from base learners, achieving superior performance in identifying attacks. The proposed model was evaluated on the UNSW-NB 15 and KDD CUP 99 datasets, achieving impressive detection accuracies of 99.69% and 99.90%, respectively. This ensemble model effectively addresses challenges in cybersecurity, such as high false-positive rates and imbalanced data, by employing adaptive synthetic sampling and feature selection. Comparative analysis reveals that the super learner model outperforms existing detection methods, improving detection accuracy by up to 9.54%. These findings suggest that the super learner ensemble approach is a promising method for enhancing the security of web applications. Future work could expand on these results by exploring different base models, datasets, and real-time anomaly detection mechanisms to further improve web server protection.

2024-11-29 — Estimating the causal effect of dexamethasone versus hydrocortisone on the neutrophil- lymphocyte ratio in critically ill COVID-19 patients from Tygerberg Hospital ICU using TMLE method

Authors: Ivan Nicholas Nkuhairwe, Tonya M. Esterhuizen, L. Sigwadhi, J. L. Tamuzi, R. Machekano, P. Nyasulu
Year: 2024
Publication Date: 2024-11-29
Venue: BMC Infectious Diseases
DOI: 10.1186/s12879-024-10112-w
Link: Semantic Scholar
Matched Keywords: super learner, targeted maximum likelihood estimation, tmle

Abstract:
Causal inference from observational studies is an area of interest to researchers, advancing rapidly over the years and with it, the methods for causal effect estimation. Among them, Targeted Maximum Likelihood estimation (TMLE) possesses arguably the most outstanding statistical properties, and with no outright treatment for COVID-19, there was an opportunity to estimate the causal effect of dexamethasone versus hydrocortisone upon the neutrophil-lymphocyte ratio (NLR), a vital indicator for disease progression among critically ill COVID-19 patients. TMLE variations were used in the analysis. Super Learner (SL), Bayesian Additive Regression Trees (BART) and parametric regression (PAR) were implemented to estimate the average treatment effect (ATE). The study had 168 participants, 128 on dexamethasone and 40 on hydrocortisone. The mean causal difference in NLR on day 5; ATE [95% CI]: from SL-TMLE was − 0.309 [-3.800, 3.182] BART-TMLE 0.246 [-3.399, 3.891] and PAR-TMLE 1.245 [-1.882, 4372]. The ATE of dexamethasone versus hydrocortisone on NLR was not statistically significant since the confidence interval included zero. The effect of dexamethasone is not significantly different from that of hydrocortisone on NLR in critically ill COVID-19 patients admitted to ICU. This implies that the difference in effect on NLR between the two drugs is due to random chance. TMLE remains an outstanding approach for causal analysis of observational studies with the ability to be augmented with multiple prediction approaches.

2024-11-22 — Double Machine Learning for Adaptive Causal Representation in High-Dimensional Data

Authors: Lynda Aouar, Han Yu
Year: 2024
Publication Date: 2024-11-22
Venue: arXiv.org
DOI: 10.48550/arXiv.2411.14665
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Adaptive causal representation learning from observational data is presented, integrated with an efficient sample splitting technique within the semiparametric estimating equation framework. The support points sample splitting (SPSS), a subsampling method based on energy distance, is employed for efficient double machine learning (DML) in causal inference. The support points are selected and split as optimal representative points of the full raw data in a random sample, in contrast to the traditional random splitting, and providing an optimal sub-representation of the underlying data generating distribution. They offer the best representation of a full big dataset, whereas the unit structural information of the underlying distribution via the traditional random data splitting is most likely not preserved. Three machine learning estimators were adopted for causal inference, support vector machine (SVM), deep learning (DL), and a hybrid super learner (SL) with deep learning (SDL), using SPSS. A comparative study is conducted between the proposed SVM, DL, and SDL representations using SPSS, and the benchmark results from Chernozhukov et al. (2018), which employed random forest, neural network, and regression trees with a random k-fold cross-fitting technique on the 401(k)-pension plan real data. The simulations show that DL with SPSS and the hybrid methods of DL and SL with SPSS outperform SVM with SPSS in terms of computational efficiency and the estimation quality, respectively.

2024-11-21 — Digital Gaming and Subsequent Health and Well-Being Among Older Adults: Longitudinal Outcome-Wide Analysis

Authors: A. Nakagomi, Kazushige Ide, K. Kondo, K. Shiba
Year: 2024
Publication Date: 2024-11-21
Venue: Journal of Medical Internet Research
DOI: 10.2196/69080
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Background Digital gaming has become increasingly popular among older adults, potentially offering cognitive, social, and physical benefits. However, its broader impact on health and well-being, particularly in real-world settings, remains unclear. Objective This study aimed to evaluate the multidimensional effects of digital gaming on health and well-being among older adults, using data from the Japan Gerontological Evaluation Study conducted in Matsudo City, Chiba, Japan. Methods Data were drawn from 3 survey waves (2020 prebaseline, 2021 baseline, and 2022 follow-up) of the Japan Gerontological Evaluation Study, which targets functionally independent older adults. The exposure variable, digital gaming, was defined as regular video game play and was assessed in 2021. In total, 18 outcomes across 6 domains were evaluated in 2022; domain 1—happiness and life satisfaction, domain 2—physical and mental health, domain 3—meaning and purpose, domain 4—character and virtue, domain 5—close social relationships, and domain 6—health behavior. Furthermore, 10 items from the Human Flourishing Index were included in domains 1-5, with 2 items for each domain. Overall flourishing was defined as the average of the means across these 5 domains. In addition, 7 items related to domains 2, 5, and 6 were assessed. The final sample consisted of 2504 participants aged 65 years or older, with questionnaires containing the Human Flourishing Index randomly distributed to approximately half of the respondents (submodule: n=1243). Consequently, we used 2 datasets for analysis. We applied targeted maximum likelihood estimation to estimate the population average treatment effects, with Bonferroni correction used to adjust for multiple testing. Results Digital gaming was not significantly associated with overall flourishing or with any of the 5 domains from the Human Flourishing Index. Although initial analyses indicated associations between digital gaming and participation in hobby groups (mean difference=0.12, P=.005) as well as meeting with friends (mean difference=0.076, P=.02), these associations did not remain significant after applying the Bonferroni correction for multiple testing. In addition, digital gaming was not associated with increased sedentary behavior or reduced outdoor activities. Conclusions This study provides valuable insights into the impact of digital gaming on the health and well-being of older adults in a real-world context. Although digital gaming did not show a significant association with improvements in flourishing or in the individual items across the 5 domains, it was also not associated with increased sedentary behavior or reduced outdoor activities. These findings suggest that digital gaming can be part of a balanced lifestyle for older adults, offering opportunities for social engagement, particularly through hobby groups. Considering the solitary nature of gaming, promoting social gaming opportunities may be a promising approach to enhance the positive effects of digital gaming on well-being.

2024-11-17 — ROS-Lighthouse: An Intrusion Detection System (IDS) in ROS using Ensemble Learning

Authors: Stevens Johnson, Abinash Borah, Anirudh Paranjothi, Johnson P. Thomas
Year: 2024
Publication Date: 2024-11-17
Venue: International Symposium on Wireless Personal Multimedia Communications
DOI: 10.1109/WPMC63271.2024.10863805
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Security concerns of Robot Operating System (ROS) have come to the limelight recently, especially as ROS is paving its way into more applications. ROS lacks built-in security features in its architecture, rendering it susceptible to attacks. Efforts have been made to secure ROS communication; however, the current landscape still poses risks. This study proposes ROSlighthouse, a framework based on machine learning, that probes through the data received by a ROS node to classify intrusions. An Intrusion Detection System (IDS) is created utilizing the Super Learner algorithm to identify intrusions within ROS networks. ROS-lighthouse has been assessed and determined to detect intrusions with accuracies up to 99.907%. Additionally, recommendations and future directions for enhancing ROSlighthouse’s connectivity to the internet are provided.

2024-11-15 — G-computation for increasing performances of clinical trials with individual randomization and binary response

Authors: J. Keizer, R. Lenain, R. Porcher, Sarah Zoha, A. Chatton, Yohann Foucher
Year: 2024
Publication Date: 2024-11-15
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
In a clinical trial, the random allocation aims to balance prognostic factors between arms, preventing true confounders. However, residual differences due to chance may introduce near-confounders. Adjusting on prognostic factors is therefore recommended, especially because the related increase of the power. In this paper, we hypothesized that G-computation associated with machine learning could be a suitable method for randomized clinical trials even with small sample sizes. It allows for flexible estimation of the outcome model, even when the covariates' relationships with outcomes are complex. Through simulations, penalized regressions (Lasso, Elasticnet) and algorithm-based methods (neural network, support vector machine, super learner) were compared. Penalized regressions reduced variance but may introduce a slight increase in bias. The associated reductions in sample size ranged from 17\% to 54\%. In contrast, algorithm-based methods, while effective for larger and more complex data structures, underestimated the standard deviation, especially with small sample sizes. In conclusion, G-computation with penalized models, particularly Elasticnet with splines when appropriate, represents a relevant approach for increasing the power of RCTs and accounting for potential near-confounders.

2024-11-13 — RF sensing enabled tracking of human facial expressions using machine learning algorithms

Authors: Hira Hameed, Mostafa Elsayed, Jaspreet Kaur, Muhammad Usman, Chong Tang, N. Ghadban, J. Kernec, Amir Hussain, Muhammad Imran, Qammer H. Abbasi
Year: 2024
Publication Date: 2024-11-13
Venue: Scientific Reports
DOI: 10.1038/s41598-024-75909-w
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Automatic analysis of facial expressions has emerged as a prominent research area in the past decade. Facial expressions serve as crucial indicators for understanding human behavior, enabling the identification and assessment of positive and negative emotions. Moreover, facial expressions provide insights into various aspects of mental activities, social connections, and physiological information. Currently, most facial expression detection systems rely on cameras and wearable devices. However, these methods have drawbacks, including privacy concerns, issues with poor lighting and line of sight blockage, difficulties in training with longer video sequences, computational complexities, and disruptions to daily routines. To address these challenges, this study proposes a novel and privacy-preserving human behavior recognition system that utilizes Frequency Modulated Continuous Wave (FMCW) radar combined with Machine Learning (ML) techniques for classifying facial expressions. Specifically, the study focuses on five common facial expressions: Happy, Sad, Fear, Surprise, and Neutral. The recorded data is obtained in the form of a Micro-Doppler signal, and state-of-the-art ML models such as Super Learner, Linear Discriminant Analysis, Random Forest, K-Nearest Neighbor, Long Short-Term Memory, and Logistic Regression are employed to extract relevant features. These extracted features from the radar data are then fed into ML models for classification. The results show a highly promising classification accuracy of 91%. The future applications of the proposed work will lead to advancements in technology, healthcare, security, and communication, thereby improving overall human well-being and societal functioning.

2024-11-12 — Targeted Maximum Likelihood Estimation for Integral Projection Models in Population Ecology

Authors: Yunzhe Zhou, Giles Hooker
Year: 2024
Publication Date: 2024-11-12
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Integral projection models (IPMs) are widely used to study population growth and the dynamics of demographic structure (e.g. age and size distributions) within a population.These models use data on individuals' growth, survival, and reproduction to predict changes in the population from one time point to the next and use these in turn to ask about long-term growth rates, the sensitivity of that growth rate to environmental factors, and aspects of the long term population such as how much reproduction concentrates in a few individuals; these quantities are not directly measurable from data and must be inferred from the model. Building IPMs requires us to develop models for individual fates over the next time step -- Did they survive? How much did they grow or shrink? Did they Reproduce? -- conditional on their initial state as well as on environmental covariates in a manner that accounts for the unobservable quantities that are are ultimately interested in estimating.Targeted maximum likelihood estimation (TMLE) methods are particularly well-suited to a framework in which we are largely interested in the consequences of models. These build machine learning-based models that estimate the probability distribution of the data we observe and define a target of inference as a function of these. The initial estimate for the distribution is then modified by tilting in the direction of the efficient influence function to both de-bias the parameter estimate and provide more accurate inference. In this paper, we employ TMLE to develop robust and efficient estimators for properties derived from a fitted IPM. Mathematically, we derive the efficient influence function and formulate the paths for the least favorable sub-models. Empirically, we conduct extensive simulations using real data from both long term studies of Idaho steppe plant communities and experimental Rotifer populations.

2024-11-06 — Maternal Nutritional Factors Enhance Birthweight Prediction: A Super Learner Ensemble Approach

Authors: Muhammad Mursil, Hatem A. Rashwan, Pere Cavallé-Busquets, L. Santos-Calderón, Michelle M. Murphy, D. Puig
Year: 2024
Publication Date: 2024-11-06
Venue: Inf.
DOI: 10.3390/info15110714
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Birthweight (BW) is a widely used indicator of neonatal health, with low birthweight (LBW) being linked to higher risks of morbidity and mortality. Timely and precise prediction of LBW is crucial for ensuring newborn health and well-being. Despite recent machine learning advancements in BW classification based on physiological traits in the mother and ultrasound outcomes, maternal status in essential micronutrients for fetal development is yet to be fully exploited for BW prediction. This study aims to evaluate the impact of maternal nutritional factors, specifically mid-pregnancy plasma concentrations of vitamin B12, folate, and anemia on BW prediction. This study analyzed data from 729 pregnant women in Tarragona, Spain, for early BW prediction and analyzed each factor’s impact and contribution using a partial dependency plot and feature importance. Using a super learner ensemble method with tenfold cross-validation, the model achieved a prediction accuracy of 96.19% and an AUC-ROC of 0.96, outperforming single-model approaches. Vitamin B12 and folate status were identified as significant predictors, underscoring their importance in reducing LBW risk. The findings highlight the critical role of maternal nutritional factors in BW prediction and suggest that monitoring vitamin B12 and folate levels during pregnancy could enhance prenatal care and mitigate neonatal complications associated with LBW.

2024-11-01 — Examining Determinants of Transport-Related Carbon Dioxide Emissions by Novel Super Learner Algorithm

Authors: M. Kartal, U. Pata, Özer Depren
Year: 2024
Publication Date: 2024-11-01
Venue: Transportation Research Part D: Transport and Environment
DOI: 10.1016/j.trd.2024.104429
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2024-10-31 — Improved Heart Diseases Risk Prediction Using Optimized Super Learner Ensemble Model

Authors: Vasantha Kalyani
Year: 2024
Publication Date: 2024-10-31
Venue: International Journal of Intelligent Engineering and Systems
DOI: 10.22266/ijies2024.1031.25
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
: Cardio Vascular Diseases (CVD) has become a serious concern for humans as fatalities rate due to CVD are increasing at an alarming pace. With the aid of machine learning techniques, heart illnesses can be predicted much earlier, and therapy or dietary changes can prevent deaths. By combining predictions from various individual models, the machine learning technique known as ensemble learning improves forecasting accuracy and resiliency. In this work, a Super Learner Ensemble Model is used where the base learners are a diverse combination of linear, probabilistic, bagging, boosting and stacking models. To improve the performance of the Super Learner Ensemble Model, an Optimized Super Learner Ensemble Model (OSLEM) is proposed, where optimal selection of base learners in the ensemble is done based on the pairwise disagreement accuracy diversity measure of classifiers in each best fitness whale obtained by different iterations of Whale Optimization Algorithm (WOA).

2024-10-28 — The effects of hypothetical psychological interventions on alcohol use in European young adults

Authors: Stefanie Do, C. Börnhorst, V. Didelez, W. Ahrens, A. Hebestreit, I. Idefics, Family consortia
Year: 2024
Publication Date: 2024-10-28
Venue: European Journal of Public Health
DOI: 10.1093/eurpub/ckae144.985
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract Background Low psychosocial well-being (PWB) and high emotion-driven impulsiveness (EDI) are associated with alcohol. Yet, it is unclear whether a hypothetical intervention targeting one or the other in adolescence might be more effective in reducing alcohol consumption (AC) in young adulthood. Therefore, we aimed to compare the separate causal effects of PWB and EDI in adolescence on AC in young adulthood. Methods We included 505 European young adults from the IDEFICS/I.Family cohort (mean age: 20.2 years; age range: 18.2-23.5 years) who did not drink alcohol at study entry. AC was operationalized as the amount of weekly consumed alcoholic beverages (mean: 4.2 drinks per week; range: 0.3-70 drinks per week). EDI was assessed using the negative urgency subscale from the UPPS-P Impulsive Behaviour Scale. PWB was assessed using the KINDLR Health-Related Quality of Life Questionnaire. Following the principles of target trial emulation, we estimated, separately, the average causal effects of PWB and EDI on AC accounting for relevant confounders and applying a semi-parametric doubly robust method (targeted maximum likelihood estimation). We stratified the results by sex and parental education. Results If all adolescents, hypothetically, had high levels of PWB, compared to low levels, we estimated a decrease in the average amount of alcoholic beverages in young adulthood by 0.1 drinks per week [95%-confidence interval: -2.3; 2.1]. Furthermore, if all adolescents had low levels of EDI, compared to high levels, we estimated an increase in alcoholic beverages in young adulthood by 1.5 drinks per week [0.1; 2.9]. Different effects for sex and parental education groups were found. Conclusions Hypothetical interventions targeting PWB in adolescents were not found to have effects on reducing AC in young adulthood. Interventions targeting EDI, however, would lead to an increase in AC. This may be due to unmeasured confounding and the missing distinction between drinking motives. Key messages • We demonstrate that causal inference methods, compared to traditional ones, improve the robustness of estimated effect measures and address important sources of bias in European cohort data. • To inform public health interventions on reducing alcohol consumption, future research should investigate different drinking motives, e.g. alleviating negative emotions or enhancing positive emotions.

2024-10-27 — Longitudinal impact of different treatment sequences of second-generation antipsychotics on metabolic outcomes: a study using targeted maximum likelihood estimation

Authors: Yaning Feng, Kenneth C Y Wong, Perry Bok-Man Leung, Benedict K.W. Lee, P. Sham, S. S. Lui, Hon-Cheong So
Year: 2024
Publication Date: 2024-10-27
Venue: Psychological Medicine
DOI: 10.1017/S0033291725000935
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract Background Second-generation antipsychotics (SGAs) cause metabolic side effects. However, patients’ metabolic profiles were influenced by time-invariant and time-varying confounders. Real-world evidence on the long-term, dynamic effects of SGAs (e.g. different treatment sequences) are limited. We employed advanced causal inference methods to evaluate the metabolic impact of SGAs in a naturalistic cohort. Methods We followed 696 Chinese patients with schizophrenia-spectrum disorders receiving SGAs. Longitudinal targeted maximum likelihood estimation (LTMLE) was used to estimate the average treatment effects (ATEs) of continuous SGA treatment versus ‘no treatment’ on metabolic outcomes, including total cholesterol (TC), high-density lipoprotein (HDL), low-density lipoprotein (LDL), triglyceride (TG), fasting glucose (FG), and body mass index (BMI), over 6–18 months at 3-month intervals. LTMLE accounted for time-invariant and time-varying confounders. Post-SGA discontinuation side effects were also assessed. Results The ATEs of continuous SGA treatment on BMI and TG showed an inverted U-shaped pattern, peaking at 12 months and declining afterwards. Similar patterns were observed for TC and LDL, albeit the ATEs peaked at 15 months. For FG and HDL, the ATEs peaked at ~6 months. The adverse impact of SGAs on BMI persisted even after medication discontinuation, yet other metabolic parameters did not show such lingering side effects. Clozapine and olanzapine exhibited greater metabolic side effects compared to other SGAs. Conclusions Our real-world study suggests that metabolic side effects may stabilize with prolonged continuous treatment. Clozapine and olanzapine confer higher cardiometabolic risks than other SGAs. The side effects of SGAs on BMI may persist after drug discontinuation. These insights may guide antipsychotic choice and improve management of metabolic side effects.

2024-10-25 — trajmsm: An R package for Trajectory Analysis and Causal Modeling

Authors: Awa Diop, C. Sirois, J. Guertin, Mireille E. Schnitzer, James M. Brophy, Denis Talbot
Year: 2024
Publication Date: 2024-10-25
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
The R package trajmsm provides functions designed to simplify the estimation of the parameters of a model combining latent class growth analysis (LCGA), a trajectory analysis technique, and marginal structural models (MSMs) called LCGA-MSM. LCGA summarizes similar patterns of change over time into a few distinct categories called trajectory groups, which are then included as"treatments"in the MSM. MSMs are a class of causal models that correctly handle treatment-confounder feedback. The parameters of LCGA-MSMs can be consistently estimated using different estimators, such as inverse probability weighting (IPW), g-computation, and pooled longitudinal targeted maximum likelihood estimation (pooled LTMLE). These three estimators of the parameters of LCGA-MSMs are currently implemented in our package. In the context of a time-dependent outcome, we previously proposed a combination of LCGA and history-restricted MSMs (LCGA-HRMSMs). Our package provides additional functions to estimate the parameters of such models. Version 0.1.3 of the package is currently available on CRAN.

2024-10-24 — Doubly Robust Nonparametric Efficient Estimation for Provider Evaluation

Authors: Herbert P Susmann, Yiting Li, M. McAdams‐DeMarco, Iv'an D'iaz, Wenbo Wu
Year: 2024
Publication Date: 2024-10-24
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Provider profiling has the goal of identifying healthcare providers with exceptional patient outcomes. When evaluating providers, adjustment is necessary to control for differences in case-mix between different providers. Direct and indirect standardization are two popular risk adjustment methods. In causal terms, direct standardization examines a counterfactual in which the entire target population is treated by one provider. Indirect standardization, commonly expressed as a standardized outcome ratio, examines the counterfactual in which the population treated by a provider had instead been randomly assigned to another provider. Our first contribution is to present nonparametric efficiency bound for direct and indirectly standardized provider metrics by deriving their efficient influence functions. Our second contribution is to propose fully nonparametric estimators based on targeted minimum loss-based estimation that achieve the efficiency bounds. The finite-sample performance of the estimator is investigated through simulation studies. We apply our methods to evaluate dialysis facilities in New York State in terms of unplanned readmission rates using a large Medicare claims dataset. A software implementation of our methods is available in the R package TargetedRisk.

2024-10-23 — A Super Learner Ensemble-based Intrusion Detection System to Mitigate Network Attacks

Authors: Ojo John Ajayi, A. Sodiya, M. Bagiwa, T. A. Olowookere
Year: 2024
Publication Date: 2024-10-23
Venue: 2024 5th International Conference on Data Analytics for Business and Industry (ICDABI)
DOI: 10.1109/ICDABI63787.2024.10800423
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Governments and corporate institutions are now mostly reliant on integrated digital infrastructures. These digital infrastructures are usually targets of cyber threats such as intrusion, for which intrusion detection systems (IDS) have emerged. One of the key needs for a robust IDS includes reducing the rate of false positives and thus improving accuracy. In this study, three traditional machine learning (ML) algorithms, including K-Nearest Neighbor (KNN), Naive Bayes (NB), and Decision Tree (DT), and three ensemble Machine Learning (ML) algorithms, including Random Forest (RF), Light Gradient Boosting Machine (LGBM), and Extreme Gradient Boosting (XGBOOST), were used on the UNSW-NB15 dataset from the Australian Centre for Cyber Security’s Cyber Range Lab, to train intrusion detection models. A super-learner ensemble model was then built using the best two ensemble models (XGBOOST and RF) along with the best traditional model (KNN) as its base learners. The super-learner ensemble model was able to reduce false positives and improve detection accuracy with 98% accuracy. The model was then deployed in an IDS application to mitigate network attacks effectively and efficiently.

2024-10-19 — HIGHLY ADAPTIVE LASSO: MACHINE LEARNING THAT PROVIDES VALID NONPARAMETRIC INFERENCE IN REALISTIC MODELS

Authors: Zachary Butzin-Dozier, Sky Qiu, Alan E. Hubbard, Junming Shi, Mark van der Laan
Year: 2024
Publication Date: 2024-10-19
Venue: medRxiv
DOI: 10.1101/2024.10.18.24315778
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso, tmle

Abstract:
Understanding treatment effects on health-related outcomes using real-world data requires defining a causal parameter and imposing relevant identification assumptions to translate it into a statistical estimand. Semiparametric methods, like the targeted maximum likelihood estimator (TMLE), have been developed to construct asymptotically linear estimators of these parameters. To further establish the asymptotic efficiency of these estimators, two conditions must be met: 1) the relevant components of the data likelihood must fall within a Donsker class, and 2) the estimates of nuisance parameters must converge to their true values at a rate faster than n-1/4. The Highly Adaptive LASSO (HAL) satisfies these criteria by acting as an empirical risk minimizer within a class of cadlag functions with a bounded sectional variation norm, which is known to be Donsker. HAL achieves the desired rate of convergence, thereby guaranteeing the estimators' asymptotic efficiency. The function class over which HAL minimizes its risk is flexible enough to capture realistic functions while maintaining the conditions for establishing efficiency. Additionally, HAL enables robust inference for non-pathwise differentiable parameters, such as the conditional average treatment effect (CATE) and causal dose-response curve, which are important in precision health. While these parameters are often considered in machine learning literature, these applications typically lack proper statistical inference. HAL addresses this gap by providing reliable statistical uncertainty quantification that is essential for informed decision-making in health research.

2024-10-08 — SSRI use during acute COVID-19 and risk of Long COVID among patients with depression

Authors: Zachary Butzin-Dozier, Yunwen Ji, Sarang Deshpande, Eric Hurwitz, A. Anzalone, Jeremy Coyle, Junming Shi, Andrew Mertens, Mark van der Laan, J. Colford, Rena C Patel, Alan E. Hubbard
Year: 2024
Publication Date: 2024-10-08
Venue: BMC Medicine
DOI: 10.1186/s12916-024-03655-x
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Long COVID, also known as post-acute sequelae of COVID-19 (PASC), is a poorly understood condition with symptoms across a range of biological domains that often have debilitating consequences. Some have recently suggested that lingering SARS-CoV-2 virus particles in the gut may impede serotonin production and that low serotonin may drive many Long COVID symptoms across a range of biological systems. Therefore, selective serotonin reuptake inhibitors (SSRIs), which increase synaptic serotonin availability, may be used to prevent or treat Long COVID. SSRIs are commonly prescribed for depression, therefore restricting a study sample to only include patients with depression can reduce the concern of confounding by indication. In an observational sample of electronic health records from patients in the National COVID Cohort Collaborative (N3C) with a COVID-19 diagnosis between September 1, 2021, and December 1, 2022, and a comorbid depressive disorder, the leading indication for SSRI use, we evaluated the relationship between SSRI use during acute COVID-19 and subsequent 12-month risk of Long COVID (defined by ICD-10 code U09.9). We defined SSRI use as a prescription for SSRI medication beginning at least 30 days before acute COVID-19 and not ending before SARS-CoV-2 infection. To minimize bias, we estimated relationships using nonparametric targeted maximum likelihood estimation to aggressively adjust for high-dimensional covariates. We analyzed a sample (n = 302,626) of patients with a diagnosis of a depressive condition before COVID-19 diagnosis, where 100,803 (33%) were using an SSRI. We found that SSRI users had a significantly lower risk of Long COVID compared to nonusers (adjusted causal relative risk 0.92, 95% CI (0.86, 0.99)) and we found a similar relationship comparing new SSRI users (first SSRI prescription 1 to 4 months before acute COVID-19 with no prior history of SSRI use) to nonusers (adjusted causal relative risk 0.89, 95% CI (0.80, 0.98)). These findings suggest that SSRI use during acute COVID-19 may be protective against Long COVID, supporting the hypothesis that serotonin may be a key mechanistic biomarker of Long COVID.

2024-10-04 — Development of a prediction model for 30-day COVID-19 hospitalization and death in a national cohort of Veterans Health Administration patients–March 2022—April 2023

Authors: D. Bui, K. Bajema, Yuan Huang, Lei Yan, Yuli Li, N. Rajeevan, K. Berry, M. Rowneki, Stephanie Argraves, Denise M. Hynes, Grant D Huang, Mihaela Aslan, G. Ioannou
Year: 2024
Publication Date: 2024-10-04
Venue: PLoS ONE
DOI: 10.1371/journal.pone.0307235
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Objective The epidemiology of COVID-19 has substantially changed since its emergence given the availability of effective vaccines, circulation of different viral variants, and re-infections. We aimed to develop models to predict 30-day COVID-19 hospitalization and death in the Omicron era for contemporary clinical and research applications. Methods We used comprehensive electronic health records from a national cohort of patients in the Veterans Health Administration (VHA) who tested positive for SARS-CoV-2 between March 1, 2022, and March 31, 2023. Full models incorporated 84 predictors, including demographics, comorbidities, and receipt of COVID-19 vaccinations and anti-SARS-CoV-2 treatments. Parsimonious models included 19 predictors. We created models for 30-day hospitalization or death, 30-day hospitalization, and 30-day all-cause mortality. We used the Super Learner ensemble machine learning algorithm to fit prediction models. Model performance was assessed with the area under the receiver operating characteristic curve (AUC), Brier scores, and calibration intercepts and slopes in a 20% holdout dataset. Results Models were trained and tested on 198,174 patients, of whom 8% were hospitalized or died within 30 days of testing positive. AUCs for the full models ranged from 0.80 (hospitalization) to 0.91 (death). Brier scores were close to 0, with the lowest error in the mortality model (Brier score: 0.01). All three models were well calibrated with calibration intercepts <0.23 and slopes <1.05. Parsimonious models performed comparably to full models. Conclusions We developed prediction models that accurately estimate COVID-19 hospitalization and mortality risk following emergence of the Omicron variant and in the setting of COVID-19 vaccinations and antiviral treatments. These models may be used for risk stratification to inform COVID-19 treatment and to identify high-risk patients for inclusion in clinical trials.

2024-10-01 — Super Learner Algorithm for Carotid Artery Disease Diagnosis: A Machine Learning Approach Leveraging Craniocervical CT Angiography

Authors: H. I. Özdemir, K. G. Atman, H. Şirin, A. Çalık, Ibrahim Senturk, Metin Bilge, Ismail Oran, Duygu Bilge, Celal Çınar
Year: 2024
Publication Date: 2024-10-01
Venue: Tomography
DOI: 10.3390/tomography10100120
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
This study introduces a machine learning (ML) approach to diagnosing carotid artery diseases, including stenosis, aneurysm, and dissection, by leveraging craniocervical computed tomography angiography (CTA) data. A meticulously curated, balanced dataset of 122 patient cases was used, ensuring reproducibility and data quality, and this is publicly accessible at (insert dataset location). The proposed method integrates a super learner model which combines adaptive boosting, gradient boosting, and random forests algorithms, achieving an accuracy of 90%. To enhance model robustness and generalization, techniques such as k-fold cross-validation, bootstrapping, data augmentation, and the synthetic minority oversampling technique (SMOTE) were applied, expanding the dataset to 1000 instances and significantly improving performance for minority classes like aneurysm and dissection. The results highlight the pivotal role of blood vessel structural analysis in diagnosing carotid artery diseases and demonstrate the superior performance of the super learner model in comparison with state-of-the-art (SOTA) methods in terms of both accuracy and robustness. This manuscript outlines the methodology, compares the results with state-of-the-art approaches, and provides insights for future research directions in applying machine learning to medical diagnostics.

2024-10-01 — Machine learning in causal inference for epidemiology

Authors: C. Moccia, G. Moirano, M. Popović, C. Pizzi, Piero Fariselli, L. Richiardi, C. Ekstrøm, Milena Maule
Year: 2024
Publication Date: 2024-10-01
Venue: European Journal of Epidemiology
DOI: 10.1007/s10654-024-01173-x
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
In causal inference, parametric models are usually employed to address causal questions estimating the effect of interest. However, parametric models rely on the correct model specification assumption that, if not met, leads to biased effect estimates. Correct model specification is challenging, especially in high-dimensional settings. Incorporating Machine Learning (ML) into causal analyses may reduce the bias arising from model misspecification, since ML methods do not require the specification of a functional form of the relationship between variables. However, when ML predictions are directly plugged in a predefined formula of the effect of interest, there is the risk of introducing a “plug-in bias” in the effect measure. To overcome this problem and to achieve useful asymptotic properties, new estimators that combine the predictive potential of ML and the ability of traditional statistical methods to make inference about population parameters have been proposed. For epidemiologists interested in taking advantage of ML for causal inference investigations, we provide an overview of three estimators that represent the current state-of-art, namely Targeted Maximum Likelihood Estimation (TMLE), Augmented Inverse Probability Weighting (AIPW) and Double/Debiased Machine Learning (DML).

2024-10-01 — Long-Term Dynamic Effect of Body Mass Index on Adverse Cardiovascular Outcomes with the Targeted Maximum Likelihood Estimation Method: Results from the KNOW-CKD Study

Authors: Yun Jung Oh, Kook-Hwan Oh, Wookyung Chung, Ji Yong Jung
Year: 2024
Publication Date: 2024-10-01
Venue: Journal of the American Society of Nephrology
DOI: 10.1681/asn.2024twy835r3
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2024-10-01 — Association of Crop Commercialization and Rural Households' Multidimensional Poverty Using Targeted Maximum Likelihood Estimation

Authors: Anteneh Mulugeta Eyasu, T. Zewotir, Zelalem G. Dessie
Year: 2024
Publication Date: 2024-10-01
Venue: Scientific African
DOI: 10.1016/j.sciaf.2024.e02422
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2024-10-01 — 9221 Reproductive Cancer Risk in Patients Treated with Denosumab Compared with Alendronate: A Population-based Cohort Study

Authors: S. Yahyavi, C. Selmer, Christian Torp-Pedersen, A. Juul, Martin Blomberg Jensen
Year: 2024
Publication Date: 2024-10-01
Venue: Journal of the Endocrine Society
DOI: 10.1210/jendso/bvae163.1659
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract Disclosure: S.K. Yahyavi: Research Investigator; Self; Principal investigator (PI) on an RCT with denosumab. C. Selmer: None. C. Torp-Pedersen: None. A. Juul: None. M. Blomberg Jensen: Other; Self; Holds two patents on the use of RANKL inhibitors to treat male infertility. Background and Objective: Antiresorptive treatment is used in millions of patients with osteoporosis and cancer but during the early studies of denosumab there was a slight increase in ovarian cancer incidence. The aim of this study is to determine the association between use of denosumab and risk of reproductive cancers compared with the use of alendronate in both men and women. Methods: Using a retrospective study design, we combined the Danish registries and identified a population of subjects > 50 years of age. We compared users of Denosumab that had been on alendronate treatment for at least 6 months with a matched population of patients that had been treated for at least 6 months with alendronate alone. Using the L-TMLE method we estimate the risk of reproductive cancers and the risk difference between the groups. Secondary analysis included comparisons with a healthy background population. Results: In the main analysis, a total of 18,162 subjects were included, with 6,054 denosumab users matched 1:2 with 12,108 alendronate users and followed for 3 years. 727 women and 183 men were diagnosed with a reproductive cancer during follow-up. Use of denosumab was not associated with higher risk of reproductive cancer. Compared to alendronate, women who received treatment with denosumab had a 0.06% (95% CI -0.12%; 0.26%) higher risk of a cancer diagnosis after 3 years of treatment. In a model fully adjusted for socioeconomic factors and comorbidities the risk was 0.01% (95% CI -0.35%; 0.37%) higher. The same results were found in men, and when comparing with a healthy background population and sensitivity analysis only using CKD measurements the results were confirmed. Conclusion: When comparing treatments of denosumab and alendronate, this study finds no increased risk of either cancers overall or specific reproductive cancers in men or women. Presentation: 6/3/2024

2024-10-01 — 8384 Reproductive Cancer Risk in Patients Treated with Denosumab Compared with Alendronate: A Population-based Cohort Study

Authors: S. Yahyavi, C. Selmer, Christian Torp-Pedersen, A. Juul, Martin Blomberg Jensen
Year: 2024
Publication Date: 2024-10-01
Venue: Journal of the Endocrine Society
DOI: 10.1210/jendso/bvae163.1660
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract Disclosure: S.K. Yahyavi: Research Investigator; Self; Principal investigator (PI) on an RCT with denosumab. C. Selmer: None. C. Torp-Pedersen: None. A. Juul: None. M. Blomberg Jensen: Other; Self; Holds two patents on the use of RANKL inhibitors to treat male infertility. Background and Objective: Antiresorptive treatment is used in millions of patients with osteoporosis and cancer but during the early studies of denosumab there was a slight increase in ovarian cancer incidence. The aim of this study is to determine the association between use of denosumab and risk of reproductive cancers compared with the use of alendronate in both men and women. Methods: Using a retrospective study design, we combined the Danish registries and identified a population of subjects > 50 years of age. We compared users of Denosumab that had been on alendronate treatment for at least 6 months with a matched population of patients that had been treated for at least 6 months with alendronate alone. Using the L-TMLE method we estimate the risk of reproductive cancers and the risk difference between the groups. Secondary analysis included comparisons with a healthy background population. Results: In the main analysis, a total of 18,162 subjects were included, with 6,054 denosumab users matched 1:2 with 12,108 alendronate users and followed for 3 years. 727 women and 183 men were diagnosed with a reproductive cancer during follow-up. Use of denosumab was not associated with higher risk of reproductive cancer. Compared to alendronate, women who received treatment with denosumab had a 0.06% (95% CI -0.12%; 0.26%) higher risk of a cancer diagnosis after 3 years of treatment. In a model fully adjusted for socioeconomic factors and comorbidities the risk was 0.01% (95% CI -0.35%; 0.37%) higher. The same results were found in men, and when comparing with a healthy background population and sensitivity analysis only using CKD measurements the results were confirmed. Conclusion: When comparing treatments of denosumab and alendronate, this study finds no increased risk of either cancers overall or specific reproductive cancers in men or women. Presentation: 6/3/2024

2024-09-27 — Super learner model for classifying leukemia through gene expression monitoring

Authors: Sharanya Selvaraj, Alhuseen Omar Alsayed, Nor Azman Ismail, B. P. Kavin, Edeh Michael Onyema, Gan Hong Seng, Arinze Queen Uchechi
Year: 2024
Publication Date: 2024-09-27
Venue: Discover Oncology
DOI: 10.1007/s12672-024-01337-x
Link: Semantic Scholar
Matched Keywords: super learner, super learning

Abstract:
Leukemia is a form of cancer that affects the bone marrow and lymphatic system, and it requires complex treatment strategies that vary with each subtype. Due to the subtle morphological differences among these types, monitoring gene expressions is crucial for accurate classification. Manual or pathological testing can be time-consuming and expensive. Therefore, data-driven methods and machine learning algorithms offer an efficient alternative for leukemia classification. This study introduced a novel super learning model that leverages heterogeneous machine learning models to analyze gene expression data and classify leukemia cells. The proposed approach incorporates an entropy-based feature importance technique to identify the gene profiles most significant to the labeling process. The strength of this super learning model lies in its final super learner, Random Forest, which effectively classifies cross-validated data from the candidate learners. Validation on a gene expression monitoring dataset demonstrates that this model outperforms other state-of-the-art models in predictive accuracy. The study contributes to the knowledge regarding the use of advanced machine learning techniques to improve the accuracy and reliability of leukemia classification using gene expression data, addressing the challenges of traditional methods that rely on clinical features and morphological examination.

2024-09-27 — Suicide Death Predictive Models using Electronic Health Record Data

Authors: S. Srikanth, L. Montoya, M. M. Turnure, B. Pence, N. Fulcher, B. Gaynes, D. Goldston, T. Carey, S. Ranapurwala
Year: 2024
Publication Date: 2024-09-27
Venue: medRxiv
DOI: 10.1101/2024.09.26.24314402
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
In the realm of medical research, particularly in the study of suicide risk assessment, the integration of machine learning techniques with traditional statistics methods has become increasingly prevalent. This paper used data from the UNC EHR system from 2006 to 2020 to build models to predict suicide-related death. The dataset, with 1021 cases and 10185 controls consisted of demographic variables and short-term informa- tion, on the subject's prior diagnosis and healthcare utilization. We examined the efficacy of the super learner ensemble method in predicting suicide-related death lever- aging its capability to combine multiple predictive algorithms without the necessity of pre-selecting a single model. The study compared the performance of the super learner against five base models, demonstrating its superiority in terms of cross-validated neg- ative log-likelihood scores. The super learner improved upon the best algorithm by 60% and the worst algorithm by 97.5%. We also compared the cross-validated AUC's of the models optimized to have the best AUC to highlight the importance of the choice of risk function. The results highlight the potential of the super learner in complex predictive tasks in medical research, although considerations of computational expense and model complexity must be carefully managed.

2024-09-25 — Predicting Suicides Among US Army Soldiers After Leaving Active Service.

Authors: Chris J Kennedy, Jaclyn C. Kearns, Joseph C. Geraci, Sarah M Gildea, I. Hwang, A. King, Howard Liu, Alex Luedtke, Brian P. Marx, Santiago Papini, M. Petukhova, N. Sampson, J. Smoller, Charles J. Wolock, Nur Hani Zainal, Murray B. Stein, R. Ursano, James R Wagner, Ronald C Kessler
Year: 2024
Publication Date: 2024-09-25
Venue: JAMA psychiatry
DOI: 10.1001/jamapsychiatry.2024.2744
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Importance The suicide rate of military servicemembers increases sharply after returning to civilian life. Identifying high-risk servicemembers before they leave service could help target preventive interventions. Objective To develop a model based on administrative data for regular US Army soldiers that can predict suicides 1 to 120 months after leaving active service. Design, Setting, and Participants In this prognostic study, a consolidated administrative database was created for all regular US Army soldiers who left service from 2010 through 2019. Machine learning models were trained to predict suicides over the next 1 to 120 months in a random 70% training sample. Validation was implemented in the remaining 30%. Data were analyzed from March 2023 through March 2024. Main outcome and measures The outcome was suicide in the National Death Index. Predictors came from administrative records available before leaving service on sociodemographics, Army career characteristics, psychopathologic risk factors, indicators of physical health, social networks and supports, and stressors. Results Of the 800 579 soldiers in the cohort (84.9% male; median [IQR] age at discharge, 26 [23-33] years), 2084 suicides had occurred as of December 31, 2019 (51.6 per 100 000 person-years). A lasso model assuming consistent slopes over time discriminated as well over all but the shortest risk horizons as more complex stacked generalization ensemble machine learning models. Test sample area under the receiver operating characteristic curve ranged from 0.87 (SE = 0.06) for suicides in the first month after leaving service to 0.72 (SE = 0.003) for suicides over 120 months. The 10% of soldiers with highest predicted risk accounted for between 30.7% (SE = 1.8) and 46.6% (SE = 6.6) of all suicides across horizons. Calibration was for the most part better for the lasso model than the super learner model (both estimated over 120-month horizons.) Net benefit of a model-informed prevention strategy was positive compared with intervene-with-all or intervene-with-none strategies over a range of plausible intervention thresholds. Sociodemographics, Army career characteristics, and psychopathologic risk factors were the most important classes of predictors. Conclusions and relevance These results demonstrated that a model based on administrative variables available at the time of leaving active Army service can predict suicides with meaningful accuracy over the subsequent decade. However, final determination of cost-effectiveness would require information beyond the scope of this report about intervention content, costs, and effects over relevant horizons in relation to the monetary value placed on preventing suicides.

2024-09-23 — Forecasting the cost of drought events in France by Super Learning from a short time series of many slightly dependent data

Authors: Geoffrey Ecoto, Aurélien F. Bibaut, A. Chambaz
Year: 2024
Publication Date: 2024-09-23
Venue: Computational statistics (Zeitschrift)
DOI: 10.1007/s00180-024-01549-3
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Abstract unavailable from Semantic Scholar.

2024-09-17 — Performance of Cross-Validated Targeted Maximum Likelihood Estimation

Authors: Matthew J Smith, Rachael V. Phillips, C. Maringe, Miguel Angel Luque Fernandez
Year: 2024
Publication Date: 2024-09-17
Link: Semantic Scholar
Matched Keywords: super learner, targeted maximum likelihood estimation, tmle

Abstract:
Background: Advanced methods for causal inference, such as targeted maximum likelihood estimation (TMLE), require certain conditions for statistical inference. However, in situations where there is not differentiability due to data sparsity or near-positivity violations, the Donsker class condition is violated. In such situations, TMLE variance can suffer from inflation of the type I error and poor coverage, leading to conservative confidence intervals. Cross-validation of the TMLE algorithm (CVTMLE) has been suggested to improve on performance compared to TMLE in settings of positivity or Donsker class violations. We aim to investigate the performance of CVTMLE compared to TMLE in various settings. Methods: We utilised the data-generating mechanism as described in Leger et al. (2022) to run a Monte Carlo experiment under different Donsker class violations. Then, we evaluated the respective statistical performances of TMLE and CVTMLE with different super learner libraries, with and without regression tree methods. Results: We found that CVTMLE vastly improves confidence interval coverage without adversely affecting bias, particularly in settings with small sample sizes and near-positivity violations. Furthermore, incorporating regression trees using standard TMLE with ensemble super learner-based initial estimates increases bias and variance leading to invalid statistical inference. Conclusions: It has been shown that when using CVTMLE the Donsker class condition is no longer necessary to obtain valid statistical inference when using regression trees and under either data sparsity or near-positivity violations. We show through simulations that CVTMLE is much less sensitive to the choice of the super learner library and thereby provides better estimation and inference in cases where the super learner library uses more flexible candidates and is prone to overfitting.

2024-09-10 — Comparing methods for estimating causal treatment effects of administrative health data: A plasmode simulation study.

Authors: V. Ress, Eva-Maria Wild
Year: 2024
Publication Date: 2024-09-10
Venue: Health Economics
DOI: 10.1002/hec.4891
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Estimating the causal effects of health policy interventions is crucial for policymaking but is challenging when using real-world administrative health care data due to a lack of methodological guidance. To help fill this gap, we conducted a plasmode simulation using such data from a recent policy initiative launched in a deprived urban area in Germany. Our aim was to evaluate and compare the following methods for estimating causal effects: propensity score matching, inverse probability of treatment weighting, and entropy balancing, all combined with difference-in-differences analysis, augmented inverse probability weighting, and targeted maximum likelihood estimation. Additionally, we estimated nuisance parameters using regression models and an ensemble learner called superlearner. We focused on treatment effects related to the number of physician visits, total health care cost, and hospitalization. While each approach has its strengths and weaknesses, our results demonstrate that the superlearner generally worked well for handling nuisance terms in large covariate sets when combined with doubly robust estimation methods to estimate the causal contrast of interest. In contrast, regression-based nuisance parameter estimation worked best in small covariate sets when combined with singly robust methods.

2024-09-09 — Lactated Ringer vs Normal Saline Solution During Sickle Cell Vaso-Occlusive Episodes.

Authors: A.K. Alwang, A. Law, Elizabeth S. Klings, Robyn T Cohen, N. Bosch
Year: 2024
Publication Date: 2024-09-09
Venue: JAMA Internal Medicine
DOI: 10.1001/jamainternmed.2024.4428
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Importance Sickle cell disease (SCD), a clinically heterogenous genetic hemoglobinopathy, is characterized by painful vaso-occlusive episodes (VOEs) that can require hospitalization. Patients admitted with VOEs are often initially resuscitated with normal saline (NS) to improve concurrent hypovolemia, despite preclinical evidence that NS may promote erythrocyte sickling. The comparative effectiveness of alternative volume-expanding fluids (eg, lactated Ringer [LR]) for resuscitation during VOEs is unclear. Objective To compare the effectiveness of LR to NS fluid resuscitation in patients with SCD and VOEs. Design, Setting, and Participants This multicenter cohort study and target trial emulation included inpatient adults with SCD VOEs who received either LR or NS on hospital day 1. The Premier PINC AI database (2016-2022), a multicenter clinical database including approximately 25% of US hospitalizations was used. The analysis took place between October 6, 2023, and June 20, 2024. Exposure Receipt of LR (intervention) or NS (control) on hospital day 1. Main Outcome and Measures The primary outcome was hospital-free days (HFDs) by day 30. Targeted maximum likelihood estimation was used to calculate marginal effect estimates. Heterogeneity of treatment effect was explored in subgroups. Results A total of 55 574 patient encounters where LR (n = 3495) or NS (n = 52 079) was administered on hospital day 1 were included; the median (IQR) age was 30 (25-37) years. Patients who received LR had more HFDs compared with those who received NS (marginal mean difference, 0.4; 95% CI, 0.1-0.6 days). Patients who received LR also had shorter hospital lengths of stay (marginal mean difference, -0.4; 95% CI, -0.7 to -0.1 days) and lower risk of 30-day readmission (marginal risk difference, -5.8%; 95% CI, -9.8% to -1.8%). Differences in HFDs between LR and NS were heterogenous based on fluid volume received: among patients who received less than 2 L, there was no difference in LR vs NS; among those who received 2 or more L, LR was superior to NS. Conclusion and Relevance This cohort study found that, compared with NS, LR had a small but significant improvement in HFDs and secondary outcomes including 30-day readmission. These results suggest that, among patients with VOEs in whom clinicians plan to give volume resuscitation fluids on hospital admission, LR should be preferred over NS.

2024-09-06 — Maternal age and body mass index and risk of labor dystocia after spontaneous labor onset among nulliparous women: A clinical prediction model

Authors: N. Nathan, Thomas Bergholt, Christoffer Sejling, A. S. Ersbøll, Kim Ekelund, T. Gerds, Christiane Bourgin Folke Gam, Line Rode, H. Hegaard
Year: 2024
Publication Date: 2024-09-06
Venue: PLoS ONE
DOI: 10.1371/journal.pone.0308018
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Introduction Obstetrics research has predominantly focused on the management and identification of factors associated with labor dystocia. Despite these efforts, clinicians currently lack the necessary tools to effectively predict a woman’s risk of experiencing labor dystocia. Therefore, the objective of this study was to create a predictive model for labor dystocia. Material and methods The study population included nulliparous women with a single baby in the cephalic presentation in spontaneous labor at term. With a cohort-based registry design utilizing data from the Copenhagen Pregnancy Cohort and the Danish Medical Birth Registry, we included women who had given birth from 2014 to 2020 at Copenhagen University Hospital–Rigshospitalet, Denmark. Logistic regression analysis, augmented by a super learner algorithm, was employed to construct the prediction model with candidate predictors pre-selected based on clinical reasoning and existing evidence. These predictors included maternal age, pre-pregnancy body mass index, height, gestational age, physical activity, self-reported medical condition, WHO-5 score, and fertility treatment. Model performance was evaluated using the area under the receiver operating characteristics curve (AUC) for discriminative capacity and Brier score for model calibration. Results A total of 12,445 women involving 5,525 events of labor dystocia (44%) were included. All candidate predictors were retained in the final model, which demonstrated discriminative ability with an AUC of 62.3% (95% CI:60.7–64.0) and Brier score of 0.24. Conclusions Our model represents an initial advancement in the prediction of labor dystocia utilizing readily available information obtainable upon admission in active labor. As a next step further model development and external testing across other populations is warranted. With time a well-performing model may be a step towards facilitating risk stratification and the development of a user-friendly online tool for clinicians.

2024-09-06 — Leveraging Machine Learning for Official Statistics: A Statistical Manifesto

Authors: M. Puts, David Salgado, Piet Daas
Year: 2024
Publication Date: 2024-09-06
Venue: arXiv.org
DOI: 10.48550/arXiv.2409.04365
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
It is important for official statistics production to apply ML with statistical rigor, as it presents both opportunities and challenges. Although machine learning has enjoyed rapid technological advances in recent years, its application does not possess the methodological robustness necessary to produce high quality statistical results. In order to account for all sources of error in machine learning models, the Total Machine Learning Error (TMLE) is presented as a framework analogous to the Total Survey Error Model used in survey methodology. As a means of ensuring that ML models are both internally valid as well as externally valid, the TMLE model addresses issues such as representativeness and measurement errors. There are several case studies presented, illustrating the importance of applying more rigor to the application of machine learning in official statistics.

2024-09-01 — Identifying Racial Disparities in Utilization and Clinical Outcomes of Ambulatory Hip Arthroscopy: Analysis of Temporal Trends and Causal Inference via Machine Learning

Authors: Yining Lu, Kareme D. Alder, Erick M. Marigi, John P. Mickley, Malik E. Dancy, Mario Hevesi, Bruce A. Levy, A. Krych, K. Okoroha
Year: 2024
Publication Date: 2024-09-01
Venue: Orthopaedic Journal of Sports Medicine
DOI: 10.1177/23259671241257507
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Background: Arthroscopic diagnosis and treatment of femoroacetabular pathology has experienced significant growth in the last 30 years; nevertheless, reduced utilization of orthopaedic procedures has been observed among the underrepresented population. Purpose/Hypothesis: The purpose of this study was to examine racial differences in case incidence rates, outcomes, and complications in patients undergoing hip arthroscopy. It was hypothesized that racial and ethnic minority patients would undergo hip arthroscopy at a decreased rate compared with their White counterparts but that there would be no differences in clinical outcomes. Study Design: Cross-sectional study. Methods: The State Ambulatory Surgery and Services Database and the State Emergency Department Database of New York were queried for patients undergoing hip arthroscopy between 2011 and 2017. Patients were stratified into White and racial and ethnic minority races, and intergroup comparisons were performed for utilization over time, total charges billed per encounter, 90-day emergency department (ED) visits, and revision hip arthroscopy. Temporal trends in the utilization of hip arthroscopy were identified, and racial differences in secondary outcomes were analyzed with a semiparametric method known as targeted maximum likelihood estimation (TMLE) backed by a library of machine learning algorithms. Results: A total of 9745 patients underwent hip arthroscopy during the study period, with 1081 patients of minority race (11.1%). White patients underwent hip arthroscopy at 5.68 (95% CI, 4.98-6.48) times the incidence rate of racial and ethnic minority patients; these incidence rates grew annually at a ratio of 1.11 in White patients compared with 1.03 in racial and ethnic minority patients (P < .001). Based on the TMLE, racial and ethnic minority patients were significantly more likely to incur higher costs (P < .001) and visit the ED within 90 days (P = .049) but had negligible differences in reoperation rates at a 2-year follow-up (P = .53). Subgroup analysis identified that higher likelihood for 90-day ED admissions among racial and ethnic minority patients compared with White patients was associated with Medicare insurance (P = .002), median income in the lowest quartile (P = .012), and residence in low-income neighborhoods (P = .006). Conclusion: Irrespective of insurance status, racial and ethnic minority patients undergo hip arthroscopy at a lower incidence and incur higher costs per surgical encounter.

2024-09-01 — Explainable Artificial Intelligence for Early Prediction of Pressure Injury Risk.

Authors: J. Alderden, J. Johnny, Katie R Brooks, Andrew Wilson, Tracey L. Yap, Y. Zhao, Mark van der Laan, S. Kennerly
Year: 2024
Publication Date: 2024-09-01
Venue: American Journal of Critical Care
DOI: 10.4037/ajcc2024856
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
BACKGROUND Hospital-acquired pressure injuries (HAPIs) have a major impact on patient outcomes in intensive care units (ICUs). Effective prevention relies on early and accurate risk assessment. Traditional risk-assessment tools, such as the Braden Scale, often fail to capture ICU-specific factors, limiting their predictive accuracy. Although artificial intelligence models offer improved accuracy, their "black box" nature poses a barrier to clinical adoption. OBJECTIVE To develop an artificial intelligence-based HAPI risk-assessment model enhanced with an explainable artificial intelligence dashboard to improve interpretability at both the global and individual patient levels. METHODS An explainable artificial intelligence approach was used to analyze ICU patient data from the Medical Information Mart for Intensive Care. Predictor variables were restricted to the first 48 hours after ICU admission. Various machine-learning algorithms were evaluated, culminating in an ensemble "super learner" model. The model's performance was quantified using the area under the receiver operating characteristic curve through 5-fold cross-validation. An explainer dashboard was developed (using synthetic data for patient privacy), featuring interactive visualizations for in-depth model interpretation at the global and local levels. RESULTS The final sample comprised 28 395 patients with a 4.9% incidence of HAPIs. The ensemble super learner model performed well (area under curve = 0.80). The explainer dashboard provided global and patient-level interactive visualizations of model predictions, showing each variable's influence on the risk-assessment outcome. CONCLUSION The model and its dashboard provide clinicians with a transparent, interpretable artificial intelligence-based risk-assessment system for HAPIs that may enable more effective and timely preventive interventions.

2024-08-30 — Identification of Power Quality Disturbances in Electrical Distribution System using Fast Fourier Transforms and Super Learner Ensembles

Authors: Supakan Janthong, P. Phukpattaranont
Year: 2024
Publication Date: 2024-08-30
Venue: Journal of Advanced Research in Applied Mechanics
DOI: 10.37934/aram.124.1.3960
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Currently, the use of non-linear loads and equipment, as well as renewable energy sources injected into the power system, tends to increase. As a result, the waveform of the electrical signal changes, and distortion occurs in the distribution system, which affects the quality and reliability of the electrical system. Importantly, sometimes this leads to malfunctions in protection equipment. This paper presents the algorithm for power quality disturbance (PQD) identification in electrical distribution systems, which involves three main steps: (1) Generating simulated waveforms using a signal processing approach; (2) extracting features using the Fast Fourier Transforms (FFT) technique; and (3) identifying the type of PQD using Super Learner Ensembles (SLE), which employs cross-validation to assess the performance of multiple machine learning models. Subsequently, the model’s efficiency is verified and tested using data from electronic energy meters installed in the distribution system of the Provincial Electricity Authority (PEA). The accuracy resulting from synthetic and experimental data sets is 99.90% and 99.69%, respectively. The results indicate that the model performs well in identifying power quality disturbances and achieves high accuracy.

2024-08-28 — Replicated blood-based biomarkers for myalgic encephalomyelitis not explicable by inactivity

Authors: S. Beentjes, Artur Miralles Méharon, Julia Kaczmarczyk, Amanda Cassar, G. L. Samms, N. Hejazi, A. Khamseh, C. P. Ponting
Year: 2024
Publication Date: 2024-08-28
Venue: EMBO Molecular Medicine
DOI: 10.1038/s44321-025-00258-8
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) is a common female-biased disease. ME/CFS diagnosis is hindered by the absence of biomarkers that are unaffected by patients’ low physical activity level. Our analysis used semi-parametric efficient estimators, an initial Super Learner fit followed by a one-step correction, three mediators, and natural direct and indirect estimands, to decompose the average effect of ME/CFS status on molecular and cellular traits. For this, we used UK Biobank data for up to 1455 ME/CFS cases and 131,303 controls. Hundreds of traits differed significantly between cases and controls, including 116 significant for both female and male cohorts. These were indicative of chronic inflammation, insulin resistance and liver disease. Nine of 14 traits were replicated in the smaller All-of-Us cohort. Results cannot be explained by restricted activity: via an activity mediator, ME/CFS status significantly affected only 1 of 3237 traits. Individuals with post-exertional malaise show stronger biomarker differences. Single traits could not cleanly distinguish cases from controls. Nevertheless, these results keep alive the future ambition of a blood-based biomarker panel for accurate ME/CFS diagnosis. There are no cellular or molecular biomarkers diagnostic of myalgic encephalomyelitis (also known as chronic fatigue syndrome [ME/CFS]). We find hundreds of blood-based traits are different, as a population average, between ME/CFS cases and healthy controls. Biomarkers are indicative of chronic inflammation, insulin resistance and liver disease. Molecular and cellular differences cannot be explained by variations in physical activity. Results should accelerate research into the minimum panel required for accurately diagnosing ME/CFS. Biomarkers are indicative of chronic inflammation, insulin resistance and liver disease. Molecular and cellular differences cannot be explained by variations in physical activity. Results should accelerate research into the minimum panel required for accurately diagnosing ME/CFS. There are no cellular or molecular biomarkers diagnostic of myalgic encephalomyelitis (also known as chronic fatigue syndrome [ME/CFS]). We find hundreds of blood-based traits are different, as a population average, between ME/CFS cases and healthy controls.

2024-08-21 — Evaluating the Economic Impacts of the G20 Compact Initiative: Evidence from Causal Inference Using Advanced Machine Learning Techniques

Authors: T. Gbadegesin, N. Yaméogo
Year: 2024
Publication Date: 2024-08-21
Venue: Journal of Sustainable Development
DOI: 10.5539/jsd.v17n5p56
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
The G20 Compact with Africa (CwA) initiative, launched in 2017 under the German G20 Presidency, aims to enhance the attractiveness of private investment in Africa by improving member countries’ macro, business, and financing frameworks. This study evaluates the CwA initiative's impact on FDI, GDP per capita, gross capital formation, exports, and employment using targeted maximum likelihood estimation. In the initial Q model, we employed machine learning models like Random Forest, Gradient Boosting, and XGBoost to estimate the outcome given the covariates. Subsequently, we used OLS to update the initial estimate through the clever covariate to improve the efficiency and accuracy of the estimated treatment effect. Our findings indicate that the CwA initiative is significantly associated with increased FDI and export growth in member countries, but these gains have not yet led to broader economic growth, such as improvements in gross capital formation and GDP per capita.

2024-08-16 — Pseudo-random Number Generator Influences on Average Treatment Effect Estimates Obtained with Machine Learning

Authors: Ashley I. Naimi, Ya-Hui Yu, Lisa M. Bodnar
Year: 2024
Publication Date: 2024-08-16
Venue: Epidemiology
DOI: 10.1097/EDE.0000000000001785
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Background: The use of machine learning to estimate exposure effects introduces a dependence between the results of an empirical study and the value of the seed used to fix the pseudo-random number generator. Methods: We used data from 10,038 pregnant women and a 10% subsample (N = 1004) to examine the extent to which the risk difference for the relation between fruit and vegetable consumption and preeclampsia risk changes under different seed values. We fit an augmented inverse probability weighted estimator with two Super Learner algorithms: a simple algorithm including random forests and single-layer neural networks and a more complex algorithm with a mix of tree-based, regression-based, penalized, and simple algorithms. We evaluated the distributions of risk differences, standard errors, and P values that result from 5000 different seed value selections. Results: Our findings suggest important variability in the risk difference estimates, as well as an important effect of the stacking algorithm used. The interquartile range width of the risk differences in the full sample with the simple algorithm was 13 per 1000. However, all other interquartile ranges were roughly an order of magnitude lower. The medians of the distributions of risk differences differed according to the sample size and the algorithm used. Conclusions: Our findings add another dimension of concern regarding the potential for “p-hacking,” and further warrant the need to move away from simplistic evidentiary thresholds in empirical research. When empirical results depend on pseudo-random number generator seed values, caution is warranted in interpreting these results.

2024-08-13 — A Comparison of Methods for Estimating the Average Treatment Effect on the Treated for Externally Controlled Trials

Authors: Huan Wang, Fei Wu, Yeh-Fong Chen
Year: 2024
Publication Date: 2024-08-13
Venue: The New England Journal of Statistics in Data Science
DOI: 10.51387/25-NEJSDS77
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
While randomized trials may be the gold standard for evaluating the effectiveness of the treatment intervention, in some special circumstances, single-arm clinical trials utilizing external control may be considered. The causal treatment effect of interest for single-arm trials is usually the average treatment effect on the treated (ATT) rather than the average treatment effect (ATE). Although methods have been developed to estimate the ATT, the selection and use of these methods require a thorough comparison and in-depth understanding of the advantages and disadvantages of these methods. In this study, we conducted simulations under different identifiability assumptions to compare the performance metrics (e.g., bias, standard deviation (SD), mean squared error (MSE), type I error rate) for a variety of methods, including the regression model, propensity score matching (PSM), Mahalanobis distance matching (MDM), coarsened exact matching, inverse probability weighting, augmented inverse probability weighting (AIPW), AIPW with SuperLearner, and targeted maximum likelihood estimator (TMLE) with SuperLearner. Our simulation results demonstrate that the doubly robust methods in general have smaller biases than other methods. In terms of SD, nonmatching methods in general have smaller SDs than matching-based methods. The performance of MSE is a trade-off between the bias and SD, and no method consistently performs better in term of MSE. The identifiability assumptions are critical to the models’ performance: Violation of the positivity assumption can lead to a significant inflation of type I errors in some methods; violation of the unconfoundedness assumption can lead to a large bias for all methods. According to the simulation results, under most scenarios we examined, PSM and MDM methods perform best overall in terms of type I error control. However, they in general have worse performance in the estimation accuracy compared to doubly robust methods given that the identifiability assumptions are not severely violated.

2024-08-09 — A Density Ratio Super Learner

Authors: Wencheng Wu, David Benkeser
Year: 2024
Publication Date: 2024-08-09
Venue: arXiv.org
DOI: 10.48550/arXiv.2408.04796
Link: Semantic Scholar
Matched Keywords: super learner, super learning

Abstract:
The estimation of the ratio of two density probability functions is of great interest in many statistics fields, including causal inference. In this study, we develop an ensemble estimator of density ratios with a novel loss function based on super learning. We show that this novel loss function is qualified for building super learners. Two simulations corresponding to mediation analysis and longitudinal modified treatment policy in causal inference, where density ratios are nuisance parameters, are conducted to show our density ratio super learner's performance empirically.

2024-08-08 — Revisiting diabetes risk of olanzapine versus aripiprazole in serious mental illness care

Authors: Denis Agniel, Sharon-Lise T. Normand, John W. Newcomer, Katya Zelevinsky, Jason Poulos, Jeannette Tsuei, Marcela Horvitz-Lennon
Year: 2024
Publication Date: 2024-08-08
Venue: BJPsych Open
DOI: 10.1192/bjo.2024.727
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Background Exposure to second-generation antipsychotics (SGAs) carries a risk of type 2 diabetes, but questions remain about the diabetogenic effects of SGAs. Aims To assess the diabetes risk associated with two frequently used SGAs. Method This was a retrospective cohort study of adults with schizophrenia, bipolar I disorder or severe major depressive disorder (MDD) exposed during 2008–2013 to continuous monotherapy with aripiprazole or olanzapine for up to 24 months, with no pre-period exposure to other antipsychotics. Newly diagnosed type 2 diabetes was quantified with targeted minimum loss-based estimation; risk was summarised as the restricted mean survival time (RMST), the average number of diabetes-free months. Sensitivity analyses were used to evaluate potential confounding by indication. Results Aripiprazole-treated patients had fewer diabetes-free months compared with olanzapine-treated patients. RMSTs were longer in olanzapine-treated patients, by 0.25 months [95% CI: 0.14, 0.36], 0.16 months [0.02, 0.31] and 0.22 months [0.01, 0.44] among patients with schizophrenia, bipolar I disorder and severe MDD, respectively. Although some sensitivity analyses suggest a risk of unobserved confounding, E-values indicate that this risk is not severe. Conclusions Using robust methods and accounting for exposure duration effects, we found a slightly higher risk of type 2 diabetes associated with aripiprazole compared with olanzapine monotherapy regardless of diagnosis. If this result was subject to unmeasured selection despite our methods, it would suggest clinician success in identifying olanzapine candidates with low diabetes risk. Confirmatory research is needed, but this insight suggests a potentially larger role for olanzapine in the treatment of well-selected patients, particularly for those with schizophrenia, given the drug's effectiveness advantage among them.

2024-08-05 — An Online Meta-Level Adaptive-Design Framework with Targeted Learning Inference: Applications to Evaluating and Utilizing Surrogate Outcomes in Adaptive Designs

Authors: Wenxin Zhang, Aaron Hudson, Maya Petersen, M. V. D. Laan
Year: 2024
Publication Date: 2024-08-05
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Adaptive designs are increasingly used in clinical trials and online experiments to improve participant outcomes by dynamically updating treatment allocation based on accumulating data. However, in practice, experimenters often consider multiple candidate designs, each with distinct trade-offs, while only one can be implemented at a time, leaving benefits and costs of alternative designs unobserved and unquantified. To address this, we propose a novel meta-level adaptive design framework that enables real-time, data-driven evaluation and selection among candidate adaptive designs. Specifically, we define a new class of causal estimands to evaluate adaptive designs, estimate them with Targeted Maximum Likelihood Estimation framework, which yields an asymptotically normal estimator accommodating dependence in adaptive-design data without parametric assumptions, and support online design selection. We further apply this framework to a motivating example where multiple surrogates of a long-term primary outcome are considered for updating randomization probabilities in adaptive experiments. Unlike existing surrogate evaluation methods, our approach comprehensively quantifies the utility of surrogates to accelerate detection of heterogeneous treatment effects, expedite updates to treatment randomization and improve participant outcomes, facilitating dynamic selection among surrogate-guided adaptive designs. Overall, our framework provides a unified tool for evaluating opportunities and costs of various adaptive designs and guiding real-time decision-making in adaptive experiments.

2024-08-01 — Metabolite Predictors of Breast and Colorectal Cancer Risk in the Women’s Health Initiative

Authors: Sandi L Navarro, Brian D. Williamson, Ying Huang, G. N. Nagana Gowda, Daniel Raftery, Lesley F. Tinker, C. Zheng, Shirley AA Beresford, Hayley Purcell, Danijel Djukovic, Haiwei Gu, H. Strickler, F. Tabung, Ross L Prentice, M.L. Neuhouser, Johanna W. Lampe
Year: 2024
Publication Date: 2024-08-01
Venue: Metabolites
DOI: 10.3390/metabo14080463
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Metabolomics has been used extensively to capture the exposome. We investigated whether prospectively measured metabolites provided predictive power beyond well-established risk factors among 758 women with adjudicated cancers [n = 577 breast (BC) and n = 181 colorectal (CRC)] and n = 758 controls with available specimens (collected mean 7.2 years prior to diagnosis) in the Women’s Health Initiative Bone Mineral Density subcohort. Fasting samples were analyzed by LC-MS/MS and lipidomics in serum, plus GC-MS and NMR in 24 h urine. For feature selection, we applied LASSO regression and Super Learner algorithms. Prediction models were subsequently derived using logistic regression and Super Learner procedures, with performance assessed using cross-validation (CV). For BC, metabolites did not increase predictive performance over established risk factors (CV-AUCs~0.57). For CRC, prediction increased with the addition of metabolites (median CV-AUC across platforms increased from ~0.54 to ~0.60). Metabolites related to energy metabolism: adenosine, 2-hydroxyglutarate, N-acetyl-glycine, taurine, threonine, LPC (FA20:3), acetate, and glycerate; protein metabolism: histidine, leucic acid, isoleucine, N-acetyl-glutamate, allantoin, N-acetyl-neuraminate, hydroxyproline, and uracil; and dietary/microbial metabolites: myo-inositol, trimethylamine-N-oxide, and 7-methylguanine, consistently contributed to CRC prediction. Energy metabolism may play a key role in the development of CRC and may be evident prior to disease development.

2024-07-25 — Doubly Robust Targeted Estimation of Conditional Average Treatment Effects for Time-to-event Outcomes with Competing Risks

Authors: Runjia Li, V. Talisa, Chung-Chou H. Chang
Year: 2024
Publication Date: 2024-07-25
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
In recent years, precision treatment strategy have gained significant attention in medical research, particularly for patient care. We propose a novel framework for estimating conditional average treatment effects (CATE) in time-to-event data with competing risks, using ICU patients with sepsis as an illustrative example. Our approach, based on cumulative incidence functions and targeted maximum likelihood estimation (TMLE), achieves both asymptotic efficiency and double robustness. The primary contribution of this work lies in our derivation of the efficient influence function for the targeted causal parameter, CATE. We established the theoretical proofs for these properties, and subsequently confirmed them through simulations. Our TMLE framework is flexible, accommodating various regression and machine learning models, making it applicable in diverse scenarios. In order to identify variables contributing to treatment effect heterogeneity and to facilitate accurate estimation of CATE, we developed two distinct variable importance measures (VIMs). This work provides a powerful tool for optimizing personalized treatment strategies, furthering the pursuit of precision medicine.

2024-07-16 — Machine learning for causal inference: An application to ECLS-K data

Authors: Jiahe Li
Year: 2024
Publication Date: 2024-07-16
Venue: Applied and Computational Engineering
DOI: 10.54254/2755-2721/76/20240560
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
This paper explores the use of machine learning for causal inference to estimate the average treatment effect of special education services on fifth-grade math scores. Causal inference is the study of the relationship between cause and effect when changes in one variable directly affect another variable. The use of machine learning techniques in causal inference problems has been growing rapidly, offering advantages over traditional methods such as propensity score matching. such as propensity score matching. This paper compares the performance of four machine learning methods: Ordinary Least Squares (OLS), Multi-Layer Perception (MLP), Targeted Maximum Likelihood Estimation (TMLE), and Bayesian Additive Regression Trees (BART) in estimating the average treatment effect of special education services on fifth-grade math scores. This study utilizes the Early Childhood Longitudinal Study, Kindergarten Class of 1998-1999 (ECLS-K) dataset. A factor analysis is conducted to identify the key variables that influence math performance, paving the way for examining their causal effects. Our results show that BART outperforms the other methods in accuracy and robustness and that receiving special education services does not have a causal effect on math scores. This paper discusses the implications and limitations of our findings and suggests directions for future study.

2024-07-01 — SUPER-COUGH: A Super Learner-based ensemble machine learning method for detecting disease on cough acoustic signals

Authors: Elif Kevser Topuz, Y. Kaya
Year: 2024
Publication Date: 2024-07-01
Venue: Biomedical Signal Processing and Control
DOI: 10.1016/j.bspc.2024.106165
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2024-07-01 — P-660 Finding the optimal starting dose of gonadotropins for an ovarian stimulation treatment through Machine Learning

Authors: M. Dellenbach, H. Bonneau-Chloup, P. Bian, X. Hurst, J. Josse, C. Rongieres
Year: 2024
Publication Date: 2024-07-01
Venue: Human Reproduction
DOI: 10.1093/humrep/deae108.990
Link: Semantic Scholar
Matched Keywords: super learner, targeted maximum likelihood estimation, tmle

Abstract:
Can machine learning algorithms provide an individualized starting dose of gonadotropins to maximize the number of oocytes retrieved in patients undergoing ovarian stimulation ? Methods of Policy Learning based on Causal Inference and Machine Learning optimize starting doses of gonadotropins with a substantial gain of oocytes under appropriate treatment. Despite attempts toward standardization, there has been no consensus about establishing the optimal starting dose of gonadotropins to be used in ovarian stimulation. In the real world, medical practice differs considerably across countries and even among fertility clinics nationally. The machine learning community has proposed a variety of models for doing this, but with limited comparisons, causal inference can provide the statistical tools to enable the best model to be chosen and allow its explicability. In a retrospective single-center study, we reviewed the data of 11,436 cycles of ovarian stimulation for IVF between 2012 and 2023. In order to estimate the causal effect of the starting dose of gonadotropins on the number of oocytes retrieved, we selected a relevant subset of confounding covariates – including age, Antral Follicle Count (AFC), Anti-Müllerian Hormone (AMH) and oestradiol (E2) rates. We used five different Machine Learning models : Linear Regression, Regression Forest, Super Learner (SL), Multi-Arm Causal Forest (MACF) and a model based on the Targeted Maximum Likelihood Estimation (TMLE). We employed these models for Policy Learning through an Outcome Regression Modeling approach. We compared the gain of oocytes obtained through the optimal policy of our different models and analyzed the variables which played a crucial role in determining the optimal dose. We use two approaches to compare our models, the first one gives the expected average gain of oocytes under the optimal policy : in this case, the Linear Regression is the best with an average gain of 0.90 oocytes, followed by the MACF (0.80), the SL (0.58), the TMLE (0.44) and the Regression Forest (0.18). Considering the lack of heterogeneity of our database we use a second approach, an Augmented Inverse Propensity Score based value, to compare with more robustness the performance of the optimal policy of each model. In this case the MACF (an explainable model specifically designed for causal inference) is the best with a gain of 0.50 oocytes, followed by the SL (0.42), the Linear Regression (0.37), the TMLE (0.23) and the Regression Forest (0.18). In general, our models recommend lower doses, for instance 20% of the patients are recommended by the MACF to reduce their initial doses of gonadotropins by on average 100 IU. The most significant covariates in the models are the AFC, the BMI, and the rate of AMH, with p-values < 0.001 in the Linear Regression that has a R-squared of 0.418. Full IVF history was not included in patients who had previous IVF attempts, but only data from the last previous attempt was considered. Also, the study is monocentric and more heterogeneity will be present considering other centers. This study needs to be completed by data from different centers in France and around the world to bring diversity to the patients’ data and to the treatment prescriptions in order to improve external validity. Algorithm recommendations and their adherence need to be tested prospectively. not applicable

2024-06-22 — How Does CEO Duality Influence ESG Scores in Hospitality and Tourism Companies? Confounding Roles of Governance Mechanisms and Financial Indicators

Authors: H. Arıcı, O. Aladag, Mehmet ali Koseoglu
Year: 2024
Publication Date: 2024-06-22
Venue: Journal of Hospitality &amp; Tourism Research
DOI: 10.1177/10963480241266154
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Previous studies have yielded inconsistent results about the impact of CEO duality on corporate performance in the hospitality and tourism (H&T) industry. To further delve into this relationship, we investigated the causal relationship between CEO duality and environmental, social, and governance (ESG) performance under various board characteristics and financial indicators. The data from the Thomson Reuters Eikon database were evaluated using a machine learning technique that included targeted maximum likelihood estimation (TMLE), augmented inverse probability weighting (AIPW), and neural network analysis, all of which are doubly robust estimators with cross-fitting. The findings suggest that CEO duality negatively impacts environmental pillar scores but not other outcomes (i.e., governance and social pillar scores). Among the governance practices and financial indicators, policy executive compensation performance, policy executive compensation ESG performance, and return on invested capital (ROIC) have positive relations with total ESG scores. The results have important ramifications for helping H&T companies develop effective boards of directors and governance systems, as well as achieve targeted ESG performance objectives.

2024-06-16 — Data-Adaptive Identification of Effect Modifiers through Stochastic Shift Interventions and Cross-Validated Targeted Learning

Authors: David McCoy, Wenxin Zhang, Alan Hubbard, M. V. D. Laan, Alejandro Schuler
Year: 2024
Publication Date: 2024-06-16
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation, tmle

Abstract:
In epidemiology, identifying subpopulations that are particularly vulnerable to exposures and those who may benefit differently from exposure-reducing interventions is essential. Factors such as age, gender-specific vulnerabilities, and physiological states such as pregnancy are critical for policymakers when setting regulatory guidelines. However, current semi-parametric methods for estimating heterogeneous treatment effects are often limited to binary exposures and can function as black boxes, lacking clear, interpretable rules for subpopulation-specific policy interventions. This study introduces a novel method that uses cross-validated targeted minimum loss-based estimation (TMLE) paired with a data-adaptive target parameter strategy to identify subpopulations with the most significant differential impact of simulated policy interventions that reduce exposure. Our approach is assumption-lean, allowing for the integration of machine learning while still yielding valid confidence intervals. We demonstrate the robustness of our methodology through simulations and application to data from the National Health and Nutrition Examination Survey. Our analysis of NHANES data on persistent organic pollutants (POPs) and leukocyte telomere length (LTL) identified age as a significant effect modifier. Specifically, we found that exposure to 3,3',4,4',5-pentachlorobiphenyl (PCNB) consistently had a differential impact on LTL, with a one-standard deviation reduction in exposure leading to a more pronounced increase in LTL among younger populations compared to older ones. We offer our method as an open-source software package, EffectXshift, enabling researchers to investigate the effect modification of continuous exposures. The EffectXshift package provides clear and interpretable results, informing targeted public health interventions and policy decisions.

2024-06-12 — Incidence of acute kidney injury and attributive mortality in acute respiratory distress syndrome randomized trials

Authors: Edoardo Antonucci, Bruno Garcia, David Chen, M. Matthay, Kathleen D Liu, Matthieu Legrand
Year: 2024
Publication Date: 2024-06-12
Venue: Intensive Care Medicine
DOI: 10.1007/s00134-024-07485-6
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
The development of acute kidney injury (AKI) after the acute respiratory distress syndrome (ARDS) reduces the chance of organ recovery and survival. The purpose of this study was to examine the AKI rate and attributable mortality in ARDS patients. We performed an individual patient-data analysis including 10 multicenter randomized controlled trials conducted over 20 years. We employed a Super Learner ensemble technique, including a time-dependent analysis, to estimate the adjusted risk of AKI. We calculated the mortality attributable to AKI using an inverse probability of treatment weighting estimator integrated with the Super Learner. There were 5148 patients included in this study. The overall incidence of AKI was 43.7% (n = 2251). The adjusted risk of AKI ranged from 38.8% (95% confidence interval [CI], 35.7 to 41.9%) in ARMA, to 55.8% in ROSE (95% CI, 51.9 to 59.6%). 37.1% recovered rapidly from AKI, with a significantly lower recovery rate in recent trials (P < 0.001). The 90-day excess in mortality attributable to AKI was 15.4% (95% CI, 12.8 to 17.9%). It decreased from 25.4% in ARMA (95% CI, 18.7 to 32%), to 11.8% in FACTT (95% CI, 5.5 to 18%) and then remained rather stable over time. The 90-day overall excess in mortality attributable to acute kidney disease was 28.4% (95% CI, 25.3 to 31.5%). The incidence of AKI appears to be stable over time in patients with ARDS enrolled in randomized trials. The development of AKI remains a significant contributing factor to mortality. These estimates are essential for designing future clinical trials for AKI prevention or treatment.

2024-06-11 — A targeted likelihood estimation comparing cefepime and piperacillin/tazobactam in critically ill patients with community-acquired pneumonia (CAP)

Authors: Cristian C Serrano-Mayorga, Sara Duque, Elsa D. Ibáñez-Prada, Esteban Garcia-Gallo, María P Rojas Arrieta, Alirio Bastidas, Alejandro Rodríguez, I. Martín-Loeches, L. F. Reyes
Year: 2024
Publication Date: 2024-06-11
Venue: Scientific Reports
DOI: 10.1038/s41598-024-64444-3
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Cefepime and piperacillin/tazobactam are antimicrobials recommended by IDSA/ATS guidelines for the empirical management of patients admitted to the intensive care unit (ICU) with community-acquired pneumonia (CAP). Concerns have been raised about which should be used in clinical practice. This study aims to compare the effect of cefepime and piperacillin/tazobactam in critically ill CAP patients through a targeted maximum likelihood estimation (TMLE). A total of 2026 ICU-admitted patients with CAP were included. Among them, (47%) presented respiratory failure, and (27%) developed septic shock. A total of (68%) received cefepime and (32%) piperacillin/tazobactam-based treatment. After running the TMLE, we found that cefepime and piperacillin/tazobactam-based treatments have comparable 28-day, hospital, and ICU mortality. Additionally, age, PTT, serum potassium and temperature were associated with preferring cefepime over piperacillin/tazobactam (OR 1.14 95% CI [1.01–1.27], p = 0.03), (OR 1.14 95% CI [1.03–1.26], p = 0.009), (OR 1.1 95% CI [1.01–1.22], p = 0.039) and (OR 1.13 95% CI [1.03–1.24], p = 0.014)]. Our study found a similar mortality rate among ICU-admitted CAP patients treated with cefepime and piperacillin/tazobactam. Clinicians may consider factors such as availability and safety profiles when making treatment decisions.

2024-06-09 — HAL-Based Plug-in Estimation with Pointwise Asymptotic Normality of the Causal Dose-Response Curve

Authors: Junming Shi, Wenxin Zhang, Alan Hubbard, M. V. D. Laan
Year: 2024
Publication Date: 2024-06-09
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso

Abstract:
Estimating and obtaining reliable inference for the marginally adjusted causal dose-response curve for continuous treatments without relying on parametric assumptions is a well-known statistical challenge. Parametric models risk introducing significant bias through model misspecification, compromising the accurate representation of the underlying data and dose-response relationship. On the other hand, nonparametric models face difficulties as the dose-response curve is not pathwise differentiable, preventing consistent estimation at standard rates. The Highly Adaptive Lasso (HAL) maximum likelihood estimator offers a promising approach to this issue. In this paper, we introduce a HAL-based plug-in estimator for the causal dose-response curve, bridge theoretical development and empirical application, and assess its empirical performance against other estimators. This work emphasizes not just theoretical proofs, but also demonstrates their application through comprehensive simulations, thereby filling an essential gap between theory and practice. Our comprehensive simulations demonstrate that the HAL-based estimator achieves pointwise asymptotic normality with valid inference and consistently outperforms existing approaches for estimating the causal dose-response curve.

2024-06-06 — Causal inference in randomized trials with partial clustering

Authors: Joshua R Nugent, E. Kakande, G. Chamie, J. Kabami, A. Owaraganise, Diane V. Havlir, M. Kamya, L. Balzer
Year: 2024
Publication Date: 2024-06-06
Venue: Clinical Trials
DOI: 10.1177/17407745251333779
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Background: Participant dependence, if present, must be accounted for in the analysis of randomized trials. This dependence, also referred to as “clustering,” can occur in one or more trial arms. This dependence may predate randomization or arise after randomization. We examine three trial designs: one “fully clustered” (where all participants are dependent within clusters or groups) and two “partially clustered” (where some participants are dependent within clusters and some participants are completely independent of all others). Methods: For these three designs, we (1) use causal models to non-parametrically describe the data generating process and formalize the dependence in the observed data distribution; (2) develop a novel implementation of targeted minimum loss-based estimation for analysis; (3) evaluate the finite-sample performance of targeted minimum loss-based estimation and common alternatives via a simulation study; and (4) apply the methods to real-data from the SEARCH-IPT trial. Results: We show that the two randomization schemes resulting in partially clustered trials have the same dependence structure, enabling use of the same statistical methods for estimation and inference of causal effects. Our novel targeted minimum loss-based estimation approach leverages covariate adjustment and machine learning to improve precision and facilitates estimation of a large set of causal effects. In simulations, we demonstrate that targeted minimum loss-based estimation achieves comparable or markedly higher statistical power than common alternatives for these partially clustered designs. Finally, application of targeted minimum loss-based estimation to real data from the SEARCH-IPT trial resulted in 20%–57% efficiency gains, demonstrating the real-world consequences of our proposed approach. Conclusions Partially clustered trial analysis can be made more efficient by implementing targeted minimum loss-based estimation, assuming care is taken to account for the dependent nature of the observed data.

2024-06-01 — The effect of an intervention to promote isoniazid preventive therapy on leadership and management abilities

Authors: C. Christian, E. Kakande, V. Nahurira, L.B Balzer, A. Owaraganise, J.R. Nugent, W. DiIeso, D. Rast, J. Kabami, J. J. Peretz, C. S. Camlin, S. Shade, M. Kamya, D. Havlir, G. Chamie
Year: 2024
Publication Date: 2024-06-01
Venue: Public Health Action
DOI: 10.5588/pha.24.0002
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation, tmle

Abstract:
BACKGROUND Across sub-Saharan Africa, mid-level healthcare managers oversee implementation of national guidelines. It remains unclear whether leadership and management training can improve population health outcomes. METHODS We sought to evaluate leadership/management skills among district-level health managers in Uganda participating in the SEARCH-IPT randomised trial to promote isoniazid preventive therapy (IPT) for persons with HIV (PWH). The intervention, which led to higher IPT rates, included annual leadership/management training of managers. We conducted a cross-sectional survey assessing leadership/management skills among managers at trial completion. The survey evaluated self-reported use of leadership/management tools and general leadership/management. We conducted a survey among a sample of providers to understand the intervention’s impact. Targeted minimum loss-based estimation (TMLE) was used to compare responses between trial arms. RESULTS Of 163 managers participating in the SEARCH-IPT trial, 119 (73%) completed the survey. Intervention managers reported more frequent use of leadership/management tools taught in the intervention curriculum than control managers (+3.64, 95% CI 1.98–5.30, P < 0.001). There were no significant differences in self-reported leadership skills in the intervention as compared to the control group. Among providers, the average reported quality of guidance and supervision was significantly higher in intervention vs control districts (+1.08, 95% CI 0.63–1.53, P = 0.001). CONCLUSIONS A leadership and management training intervention increased the use of leadership/management tools among mid-level managers and resulted in higher perceived quality of supervision among providers in intervention vs control districts in Uganda. These findings suggest improved leadership/management among managers contributed to increased IPT use among PWH in the intervention districts of the SEARCH-IPT trial.

2024-06-01 — MSR48 Utilization of High-Dimensional Propensity Score and Targeted Maximum Likelihood Estimation with Machine Learning to Improve Causal Effect Estimation in Patients with Nonvalvular Atrial Fibrillation and Hypertension

Authors: D. Zhang, Y. Zhang, SW Ahn, S. Gruber, van der Laan M, R. Iyer, S. Reshef, MY Tian
Year: 2024
Publication Date: 2024-06-01
Venue: Value in Health
DOI: 10.1016/j.jval.2024.03.1481
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2024-05-31 — Measuring the efficiency of banks using high-performance ensemble technique

Authors: Huda H. Thabet, Saad M. Darwish, G. M. Ali
Year: 2024
Publication Date: 2024-05-31
Venue: Neural computing & applications (Print)
DOI: 10.1007/s00521-024-09929-y
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
The importance of technology and managerial risk management in banks has increased due to the financial crisis. Banks are the most affected since there are so many of them with poor financial standing. Due to this problem, an unstable and inefficient financial system causes economic stagnation in both the banking sector and overall economy. Data envelopment analysis (DEA) has been used to examine decision-making units (DMUs) performance to enhance efficiency. Currently, with the rapid growth of big data, adding more DMUs will likely require a large amount of memory and CPU time on the computer system, which will be the biggest challenge. As a result, machine learning (ML) approaches have been used to analyze financial institution performance, but many of them have variances in predictions or model stability, making measuring bank efficiency extremely difficult. For this, ensemble learning is commonly used to evaluate the performance of financial institutions in this context. This paper presents a robust super learner ensemble technique for assessing bank efficiency, with four machine learning models serving as base learners. These models are the support vector machine (SVM), K-nearest neighbors (KNN), random forest (RF), and AdaBoost classifier (ADA) which represent the base learners and their results utilized to train the meta-learner. The super learner (SL) approach is an extension of the stacking technique, which generates an ensemble based on cross-validation. One important benefit of this cross-validation theory-based technique is that it can overcome the overfitting issue that plagues most other ensemble approaches. When SL and base learners were compared for their forecasting abilities using different statistical standards, the results showed that the SL is superior to the base learners, where different variable combinations were used. The SL had accuracy (ACC) of 0.8636–0.9545 and F1-score (F1) of 0.9143–0.9714, while the basic learners had ACC of 0.5909–0.8182 and F1 of 0.6897–0.9143. So, SL is highly recommended for improving the accuracy of financial data forecasts, even with limited financial data.

2024-05-30 — Intelligent recruitment decision support combining Transformer model and knowledge graph

Authors: Xiaoman Zhang
Year: 2024
Publication Date: 2024-05-30
Venue: 2024 International Conference on Machine Intelligence and Digital Applications
DOI: 10.1145/3662739.3672315
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
In the development of the digital economy today, data-driven enterprise management has gradually become the center of enterprise operations. Currently, many recruitment systems do not make good use of the characteristics of artificial intelligence (AI), especially in natural language and logical relevance. Therefore, this paper combines Transformer with knowledge graph to study an intelligent recruitment decision support system (DSS) for enterprise needs. By establishing a knowledge base of job requirements and job competency characteristics, and with the help of Transformer's super learning and comprehension functions, the interpretation and analysis of resumes and job responsibilities can be achieved. Research has found that the application of Transformer models and knowledge graphs in intelligent recruitment decision support systems can effectively improve the quality of recruitment decisions. The maximum prediction error of the system's predicted matching degree is 7.9%, and the minimum error is 0. The system can provide candidate matching scores and skill analysis, providing strong decision support for HR.

2024-05-27 — The joint survival super learner: A super learner for right-censored data

Authors: Anders Munch, T. A. Gerds
Year: 2024
Publication Date: 2024-05-27
Link: Semantic Scholar
Matched Keywords: super learner, super learning

Abstract:
Risk prediction models are widely used to guide real-world decision-making in areas such as healthcare and economics, and they also play a key role in estimating nuisance parameters in semiparametric inference. The super learner is a machine learning framework that combines a library of prediction algorithms into a meta-learner using cross-validated loss. In the context of right-censored data, careful consideration must be given to both the choice of loss function and the estimation of expected loss. Moreover, estimators such as inverse probability of censoring weighting require accurate modeling and an estimator of the censoring distribution. We propose a novel approach to super learning for survival analysis that jointly evaluates candidate learners for both the event-time distribution and the censoring distribution. Our method imposes no restrictions on the algorithms included in the library, accommodates competing risks, and does not rely on a single pre-specified estimator of the censoring distribution. We establish a finite-sample bound on the average price we pay for using cross-validation, and show that this price vanishes asymptotically, up to poly-logarithmic terms, provided that the size of the library does not grow faster than at a polynomial rate in the sample size. We demonstrate the practical utility of our method using prostate cancer data and compare it to existing super learner algorithms for survival analysis using synthesized data.

2024-05-24 — Causal Machine Learning Methods and Use of Cross‐Fitting in Settings With High‐Dimensional Confounding

Authors: S. Ellul, Stijn Vansteelandt, John B. Carlin, M. Moreno-Betancur
Year: 2024
Publication Date: 2024-05-24
Venue: Statistics in Medicine
DOI: 10.1002/sim.70272
Link: Semantic Scholar
Matched Keywords: super learner, targeted maximum likelihood estimation, tmle

Abstract:
Observational epidemiological studies commonly seek to estimate the causal effect of an exposure on an outcome. Adjustment for potential confounding bias in modern studies is challenging due to the presence of high‐dimensional confounding, which occurs when there are many confounders relative to sample size or complex relationships between continuous confounders and exposure and outcome. Doubly robust methods such as Augmented Inverse Probability Weighting (AIPW) and Targeted Maximum Likelihood Estimation (TMLE) have the potential to address these challenges, using data‐adaptive approaches and cross‐fitting, but despite recent advances, limited evaluation and guidance are available on their implementation in realistic settings where high‐dimensional confounding is present. Motivated by an early‐life cohort study, we conducted an extensive simulation study to compare the relative performance of AIPW and TMLE using data‐adaptive approaches for estimating the average causal effect (ACE). We evaluated the benefits of using cross‐fitting with a varying number of folds, as well as the impact of using a reduced versus full (larger, more diverse) library in the Super Learner ensemble learning approach used for implementation. We found that AIPW and TMLE performed similarly in most cases for estimating the ACE, but TMLE was more stable. Cross‐fitting improved the performance of both methods, but was more important for variance estimation and coverage than for point estimates, with the number of folds a less important consideration. Using a full Super Learner library was important to reduce bias and variance in complex scenarios typical of modern health research studies.

2024-05-23 — Cognitive-Behavioral-Based Physical Therapy for Improving Recovery After a Traumatic Lower-Extremity Injury

Authors: Kristin R Archer
Year: 2024
Publication Date: 2024-05-23
Venue: Journal of Bone and Joint Surgery. American volume
DOI: 10.2106/JBJS.23.01234
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Background: Lower-extremity injuries can result in severe impairment and substantial years lived with a disability. Persistent pain and psychological distress are risk factors for poor long-term outcomes and negatively influence the recovery process following a traumatic injury. Cognitive-behavioral therapy (CBT) interventions have the potential to address these risk factors and subsequently improve outcomes. This study aimed to evaluate the effect of a telephone-delivered cognitive-behavioral-based physical therapy (CBPT) program on physical function, pain, and general health at 12 months after hospital discharge following lower-extremity trauma. The CBPT program was hypothesized to improve outcomes compared with an education program. Methods: A multicenter, randomized controlled trial was conducted involving 325 patients who were 18 to 60 years of age and had at least 1 acute orthopaedic injury to the lower extremity or to the pelvis or acetabulum requiring operative fixation. Patients were recruited from 6 Level-I trauma centers and were screened and randomized to the CBPT program or the education program early after hospital discharge. The primary outcome was the Patient-Reported Outcomes Measurement Information System (PROMIS) Physical Function (PF) scale. The secondary outcomes were objective physical function tests (4-square step test, timed stair ascent test, sit-to-stand test, and self-selected walking speed test), PROMIS Pain Intensity and Pain Interference, and the Veterans RAND 12-Item Health Survey. Treatment effects were calculated using targeted maximum likelihood estimation, a robust analytical approach appropriate for causal inference with longitudinal data. Results: The mean treatment effect on the 12-month baseline change in PROMIS PF was 0.94 (95% confidence interval, −0.68 to 2.64; p = 0.23). There were also no observed differences in secondary outcomes between the intervention group and the control group. Conclusions: The telephone-delivered CBPT did not appear to yield any benefits for patients with traumatic lower-extremity injuries in terms of physical function, pain intensity, pain interference, or general health. Improvements were observed in both groups, which questions the utility of telephone-delivered cognitive-behavioral strategies over educational programs. Level of Evidence: Therapeutic Level I. See Instructions for Authors for a complete description of levels of evidence.

2024-05-18 — Comparison of outcomes between off-pump and on-pump coronary artery bypass graft surgery using collaborative targeted maximum likelihood estimation

Authors: Hossein Ali Adineh, Kaveh Hoseini, I. Zareban, Arash Jalali, M. Nazemipour, M. Mansournia
Year: 2024
Publication Date: 2024-05-18
Venue: Scientific Reports
DOI: 10.1038/s41598-024-61846-1
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
There are some discrepancies about the superiority of the off-pump coronary artery bypass grafting (CABG) surgery over the conventional cardiopulmonary bypass (on-pump). The aim of this study was estimating risk ratio of mortality in the off-pump coronary bypass compared with the on-pump using a causal model known as collaborative targeted maximum likelihood estimation (C-TMLE). The data of the Tehran Heart Cohort study from 2007 to 2020 was used. A collaborative targeted maximum likelihood estimation and targeted maximum likelihood estimation, and propensity score (PS) adjustment methods were used to estimate causal risk ratio adjusting for the minimum sufficient set of confounders, and the results were compared. Among 24,883 participants (73.6% male), 5566 patients died during an average of 8.2 years of follow-up. The risk ratio estimates (95% confidence intervals) by unadjusted log-binomial regression model, PS adjustment, TMLE, and C-TMLE methods were 0.86 (0.78–0.95), 0.88 (0.80–0.97), 0.88 (0.80–0.97), and 0.87(0.85–0.89), respectively. This study provides evidence for a protective effect of off-pump surgery on mortality risk for up to 8 years in diabetic and non-diabetic patients.

2024-05-15 — C-Learner: Constrained Learning for Causal Inference

Authors: T. Cai, Yuri Fonseca, Kaiwen Hou, Hongseok Namkoong
Year: 2024
Publication Date: 2024-05-15
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Popular debiased estimation methods for causal inference -- such as augmented inverse propensity weighting and targeted maximum likelihood estimation -- enjoy desirable asymptotic properties like statistical efficiency and double robustness but they can produce unstable estimates when there is limited overlap between treatment and control, requiring additional assumptions or ad hoc adjustments in practice (e.g., truncating propensity scores). In contrast, simple plug-in estimators are stable but lack desirable asymptotic properties. We propose a novel debiasing approach that achieves the best of both worlds, producing stable plug-in estimates with desirable asymptotic properties. Our constrained learning framework solves for the best plug-in estimator under the constraint that the first-order error with respect to the plugged-in quantity is zero, and can leverage flexible model classes including neural networks and tree ensembles. In several experimental settings, including ones in which we handle text-based covariates by fine-tuning language models, our constrained learning-based estimator outperforms basic versions of one-step estimation and targeting in challenging settings with limited overlap between treatment and control, and performs similarly otherwise.

2024-05-12 — Adaptive-TMLE for the average treatment effect based on randomized controlled trial augmented with real-world data

Authors: Mark van der Laan, Sky Qiu, J. Tarp, Lars van der Laan
Year: 2024
Publication Date: 2024-05-12
Venue: Journal of Causal Inference
DOI: 10.1515/jci-2024-0025
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Abstract We consider the problem of estimating the average treatment effect (ATE) when both randomized control trial (RCT) data and external real-world data (RWD) are available. We decompose the ATE estimand as the difference between a pooled-ATE estimand that integrates RCT and RWD and a bias estimand that captures the conditional effect of RCT enrollment on the outcome. We introduce an adaptive targeted maximum likelihood estimation (A-TMLE) framework to estimate them. We prove that the A-TMLE estimator is n $\sqrt{n}$ -consistent and asymptotically normal. Moreover, in finite sample, it achieves the super-efficiency one would obtain had one known the oracle model for the conditional effect of the RCT enrollment on the outcome. Consequently, the smaller and more parsimonious the working model of the bias induced by the RWD is, the greater our estimator’s efficiency, while our estimator will always be at least as efficient as an efficient estimator that uses the RCT data only. A-TMLE outperforms existing methods in simulations by having smaller mean-squared-error and 95 % confidence intervals. We apply A-TMLE to augment the DEVOTE trial with external data from the Optum Clinformatics Data Mart, demonstrating its potential to establish treatment superiority in noninferiority trials. A-TMLE could utilize external RWD to help improve the power of randomized trials without biasing the estimates of intervention effects. This approach could allow for smaller, faster trials, decreasing the time until patients can receive effective treatments.

2024-05-06 — Implementasi Prediksi Siswa Dropout pada MOOC Menggunakan Metode Stacking Super Learner dalam Lingkungan Komputasi Berkinerja Tinggi

Authors: Mery Yulinda Rahmi, Arif Djunaidy, Izzat Aulia Akbar
Year: 2024
Publication Date: 2024-05-06
Venue: Jurnal Teknik ITS
DOI: 10.12962/j23373539.v13i1.129115
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2024-05-03 — Supervised Machine Learning for Bioelectrical Cellular Networks

Authors: Rajeev Jaundoo, T. Craddock, J. Tuszynski
Year: 2024
Publication Date: 2024-05-03
Venue: bioRxiv
DOI: 10.1101/2024.04.30.591880
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Cells utilize bioelectricity to form networks as well as regulate and control a variety of processes such as apoptosis, tumor suppression, and voltage-gated ion channels. In-silico modeling of bioelectrical networks can be performed using BETSE, an application that models gap junctions and ion channel activity of networked cells, but its usage of matrix-based differential equations to estimate these properties limits simulations based on the amount of computational resources available. To alleviate this issue, we trained a total of 8 machine learning models to replace three core functions of BETSE, that is, 1) predicting the average transmembrane potential (Vmem) of an entire cellular network, 2) predicting the Vmem of each individual cell within the network, and finally, 3) predicting the average ion concentrations of sodium, potassium, chloride, and calcium within the cell network. For objective 1, the random forest model was shown to be most performant over all 4 scoring metrics, in objective 2 both the decision tree and k-nearest neighbors models scored best in half of all metrics, and for objective 3 the super learner, a meta-learner comprised of multiple base learners, scored best among all scoring metrics. Overall, these models provide a more resource efficient method of predicting properties of bioelectric cellular networks, and future work will include further properties such as temperature and pressure.

2024-05-03 — A novel non-negative Bayesian stacking modeling method for Cancer survival prediction using high-dimensional omics data

Authors: Junjie Shen, Shuo Wang, Hao Sun, Jie Huang, Lu Bai, Xichao Wang, Yongfei Dong, Zaixiang Tang
Year: 2024
Publication Date: 2024-05-03
Venue: BMC Medical Research Methodology
DOI: 10.1186/s12874-024-02232-3
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Background Survival prediction using high-dimensional molecular data is a hot topic in the field of genomics and precision medicine, especially for cancer studies. Considering that carcinogenesis has a pathway-based pathogenesis, developing models using such group structures is a closer mimic of disease progression and prognosis. Many approaches can be used to integrate group information; however, most of them are single-model methods, which may account for unstable prediction. Methods We introduced a novel survival stacking method that modeled using group structure information to improve the robustness of cancer survival prediction in the context of high-dimensional omics data. With a super learner, survival stacking combines the prediction from multiple sub-models that are independently trained using the features in pre-grouped biological pathways. In addition to a non-negative linear combination of sub-models, we extended the super learner to non-negative Bayesian hierarchical generalized linear model and artificial neural network. We compared the proposed modeling strategy with the widely used survival penalized method Lasso Cox and several group penalized methods, e.g., group Lasso Cox, via simulation study and real-world data application. Results The proposed survival stacking method showed superior and robust performance in terms of discrimination compared with single-model methods in case of high-noise simulated data and real-world data. The non-negative Bayesian stacking method can identify important biological signal pathways and genes that are associated with the prognosis of cancer. Conclusions This study proposed a novel survival stacking strategy incorporating biological group information into the cancer prognosis models. Additionally, this study extended the super learner to non-negative Bayesian model and ANN, enriching the combination of sub-models. The proposed Bayesian stacking strategy exhibited favorable properties in the prediction and interpretation of complex survival data, which may aid in discovering cancer targets.

2024-05-02 — Statistical methods to control for confounders in rare disease settings that use external control

Authors: Jiwei He, Di Zhang, Feng Li
Year: 2024
Publication Date: 2024-05-02
Venue: Journal of Biopharmaceutical Statistics
DOI: 10.1080/10543406.2024.2341650
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
ABSTRACT In the drug development for rare disease, the number of treated subjects in the clinical trial is often very small, whereas the number of external controls can be relatively large. There is no clear guidance on choosing an appropriate statistical method to control baseline confounding in this situation. To fill this gap, we conduct extensive simulations to evaluate the performance of commonly used matching and weighting methods as well as the more recently developed targeted maximum likelihood estimation (TMLE) and cardinality matching in small sample settings, mimicking the motivating data from a pediatric rare disease. Among the methods examined, the performance of coarsened exact matching (CEM) and TMLE are relatively robust under various model specifications. CEM is only feasible when the number of controls far exceeds the number of treated, whereas TMLE has better performance with less extreme treatment allocation ratios. Our simulations suggest bootstrap is useful for variance estimation in small samples after matching.

2024-05-02 — Design of a Novel Model for Predicting Type-2 Diabetes: Enhanced Ensemble Learning Incorporating Dimensionality Reduction Techniques

Authors: Pappu Chandra Roy, Sudhir Kumar Mishra
Year: 2024
Publication Date: 2024-05-02
Venue: 2024 2nd International Conference on Advancement in Computation & Computer Technologies (InCACCT)
DOI: 10.1109/InCACCT61598.2024.10551097
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
The aim of this research is to explore an innovative model for forecasting type-2 diabetes, accomplished through the integration of collaborative methods and dimensionality reduction techniques to improve diagnostic precision. This innovative method culminates in the advance of the “Improved Ensemble Learning with Dimensionality Reduction’’ (IELDR) model, which combines cooperative methods and dimensionality lessening methods to elevate the accuracy of predicting type-2 diabetes. The procedure employed entails applying the IELDR model for type-2 diabetes calculation. This model influences an autoencoder for feature extraction and adopts a two-level assembling super learner framework to train robust cooperative models. The research methodology encompasses to confirmative the IELDR model with the LS-Diabetes dataset, demonstrating heightened organization precision, and supplementary authentication on the PIMA diabetes dataset and Diabetes-2019 dataset to establish the effectiveness of the method in early diagnosis. The research outcome offerings the IELDR model, which realized superior precision and minimalized error rates in forecasting type-2 diabetes, showcasing improved performance associated to other classifiers and feature collection procedures and emphasizing its efficiency in early diagnosis. The practical implication of this study lies in the making of the IELDR model, contribution an effective means to forecast the risk of type 2 diabetes progress in healthy individuals based on their current lifestyle patterns. However, the study’s opportunity is constrained by a relatively inadequate sample size of type-2 diabetes patients undergoing stress, representing the necessity for more widespread research to distinguish dissimilarities in stress levels.

2024-05-01 — Powerless in the storm: Severe weather-driven power outages in New York State, 2017–2020

Authors: Nina M. Flores, Alexander J. Northrop, Vivian Do, Milo Gordon, Yazhou Jiang, Kara E. Rudolph, Diana Hernández, Joan A. Casey
Year: 2024
Publication Date: 2024-05-01
Venue: PLOS Climate
DOI: 10.1371/journal.pclm.0000364
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
The vulnerability of the power grid to severe weather events is a critical issue as climate change is expected to increase extreme events, which can damage components of the power grid and/or lessen electrical power supply, resulting in power outages. However, largely due to an absence of granular spatiotemporal outage data, we lack a robust understanding of how severe weather-driven outages, their community impacts, and their durations distribute across space and socioeconomic vulnerability. Here, we pair hourly power outage data in electrical power operating localities (n = 1865) throughout NYS with urbanicity, CDC Social Vulnerability Index, and hourly weather (temperature, precipitation, wind speed, lightning strike, snowfall) data. We used these data to characterize the impact of extreme weather events on power outages from 2017–2020, while considering neighborhood vulnerability factors. Specifically, we assess (a) the lagged effect of severe weather on power outages, (b) common combinations of severe weather types contributing to outages, (c) the spatial distribution of the severe weather-driven outages, and (d) disparities in severe weather-driven outages by degree of community social vulnerability. We found that across NYS, 39.9% of all outages co-occurred with severe weather. However, certain regions, including eastern Queens, upper Manhattan and the Bronx of NYC, the Hudson Valley, and Adirondack regions were more burdened with severe weather-driven outages. Using targeted maximum likelihood estimation, we found that the frequency of heat-, precipitation-, and wind-driven outages disproportionately impacted vulnerable communities in NYC. When comparing durations of outages, we found that in rural regions, precipitation- and snow-driven outages lasted the longest in vulnerable communities. Under a shifting climate, anticipated increases in power outages will differentially burden communities due to regional heterogeneity in severe weather event severity, grid preparedness, and population socioeconomic profiles/vulnerabilities. As such, policymakers must consider these characteristics to inform equitable grid management and improvements.

2024-05-01 — Environmental risk score of multiple pollutants for kidney damage among residents in vulnerable areas by occupational chemical exposure in Korea

Authors: Hyun A Jang, Kyung-Hwa Choi, Yong Min Cho, Dahee Han, Y. Hong
Year: 2024
Publication Date: 2024-05-01
Venue: Environmental science and pollution research international
DOI: 10.1007/s11356-024-33567-5
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
This study aimed to develop an environmental risk score (ERS) of multiple pollutants (MP) causing kidney damage (KD) in Korean residents near abandoned metal mines or smelters and evaluate the association between ERS and KD by a history of occupational chemical exposure (OCE). Exposure to MP, consisting of nine metals, four polycyclic aromatic hydrocarbons, and four volatile organic compounds, was measured as urinary metabolites. The study participants were recruited from the Forensic Research via Omics Markers (FROM) study (n = 256). Beta-2-microglobulin (β2-MG), N-acetyl-β-D-glucosaminidase (NAG), and estimated glomerular filtration rate (eGFR) were used as biomarkers of KD. Bayesian kernel machine regression (BKMR) was selected as the optimal ERS model with the best performance and stability of the predicted effect size among the elastic net, adaptive elastic net, weighted quantile sum regression, BKMR, Bayesian additive regression tree, and super learner model. Variable importance was estimated to evaluate the effects of metabolites on KD. When stratified with the history of OCE after adjusting for several confounding factors, the risks for KD were higher in the OCE group than those in the non-OCE group; the odds ratio (OR; 95% CI) for ERS in non-OCE and OCE groups were 2.97 (2.19, 4.02) and 6.43 (2.85, 14.5) for β2-MG, 1.37 (1.01, 1.86) and 4.16 (1.85, 9.39) for NAG, and 4.57 (3.37, 6.19) and 6.44 (2.85, 14.5) for eGFR, respectively. We found that the ERS stratified history of OCE was the most suitable for evaluating the association between MP and KD, and the risks were higher in the OCE group than those in the non-OCE group.

2024-05-01 — Bridging Differences in Cohort Analyses of the Relationship between Secondhand Smoke Exposure during Pregnancy and Birth Weight: The Transportability Framework in the ECHO Program

Authors: Andreas M. Neophytou, Jenny Aalborg, S. Magzamen, B. Moore, A. Ferrara, M. Karagas, L. Trasande, D. Dabelea
Year: 2024
Publication Date: 2024-05-01
Venue: Environmental Health Perspectives
DOI: 10.1289/EHP13961
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Background: Estimates for the effects of environmental exposures on health outcomes, including secondhand smoke (SHS) exposure, often present considerable variability across studies. Knowledge of the reasons behind these differences can aid our understanding of effects in specific populations as well as inform practices of combining data from multiple studies. Objectives: This study aimed to assess the presence of effect modification by measured sociodemographic characteristics on the effect of SHS exposure during pregnancy on birth weights that may drive differences observed across cohorts. We also aimed to quantify the extent to which differences in the cohort mean effects observed across cohorts in the Environmental influences on Child Health Outcomes (ECHO) consortium are due to differing distributions of these characteristics. Methods: We assessed the presence of effect modification and transportability of effect estimates across five ECHO cohorts in a total of 6,771 mother–offspring dyads. We assessed the presence of effect modification via gradient boosting of regression trees based on the H-statistic. We estimated individual cohort effects using linear models and targeted maximum likelihood estimation (TMLE). We then estimated transported effects from one cohort to each of the remaining cohorts using a robust nonparametric estimation approach relying on TMLE estimators and compared them to the original effect estimates for these cohorts. Results: Observed effect estimates varied across the five cohorts, ranging from significantly lower birth weight associated with exposure [−167.3g; 95% confidence interval (CI): −270.4, −64.1] to higher birth weight with wide CIs, including the null (42.4g; 95% CI: −15.0, 99.8). Transported effect estimates only minimally explained differences in the point estimates for two out of the four cohort pairs. Discussion: Our findings of weak to moderate evidence of effect modification and transportability indicate that unmeasured individual-level and contextual factors and sources of bias may be responsible for differences in the effect estimates observed across ECHO cohorts. https://doi.org/10.1289/EHP13961

2024-05-01 — A super learner ensemble to map potassium fixation in California vineyard soils

Authors: Stewart G. Wilson, Gordon L. Rees, A. O’Geen
Year: 2024
Publication Date: 2024-05-01
Venue: Geoderma
DOI: 10.1016/j.geoderma.2024.116824
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2024-05-01 — A comprehensive analysis of key factors' impact on environmental performance: Evidence from Globe by novel super learner algorithm.

Authors: M. Kartal, Özer Depren, Serpil Kılıç Depren
Year: 2024
Publication Date: 2024-05-01
Venue: Journal of Environmental Management
DOI: 10.1016/j.jenvman.2024.121040
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2024-04-29 — Development of Machine Learning Models for Predicting Bubble-Point Pressure of Crude Oils

Authors: Prosper Nekekpemi, Michael Totaro, O. Olayiwola, Pascal Esenenjor
Year: 2024
Publication Date: 2024-04-29
Venue: Day 4 Thu, May 09, 2024
DOI: 10.4043/35276-ms
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Bubble-point pressure is a crucial parameter in reservoir and production engineering in the oil and gas industry, but its accurate determination through experimental methods is both costly and time-consuming. Alternative approaches, such as equations of state and empirical correlations like Al Marhoun, Dokla and Osman, Glaso, Standing, and Vazquez and Beggs, are commonly used but suffer from limitations including their inability to capture complex, non-linear relationships and adapt to new or high-dimensional data. This study aims to address these shortcomings by developing and evaluating a range of machine learning models—including Decision Tree, Linear Regression, Random Forest, Support Vector Regression (SVR), K-Nearest Neighbors (KNN), AdaBoosting, Gradient Boosting, Stacked Super Learner, and Multilayer Perceptron Neural Network (MLPNN)—for predicting bubble-point pressure as a function of the reservoir temperature, gas gravity, solution gas-oil ratio, and oil gravity (API). Utilizing a comprehensive dataset derived from different published papers, a total of 776 data sets were used in this study which were divided into 80% for training and 20% for testing. The study employed performance metrics such as Average Percentage Relative Error (APRE), Absolute Average Percentage Relative Error (AAPRE), Root Mean Square Error (RMSE), and Coefficient of Determination for evaluation. The Gradient Boosting model emerged as the most effective, with an RMSE of 364.027 and an R2 of 0.924 on the test data, outperforming the existing correlations used in this study. The results demonstrate the potential of machine learning models, particularly the Gradient Boosting model, in offering advantages such as capturing complex relationships thereby contributing to more effective reservoir management strategies.

2024-04-27 — Impact of LS Mutation on Pharmacokinetics of Preventive HIV Broadly Neutralizing Monoclonal Antibodies: A Cross-Protocol Analysis of 16 Clinical Trials in People without HIV

Authors: Bryan T. Mayer, Lily Zhang, Allan C. deCamp, Chenchen Yu, Alicia H Sato, Heather Angier, K. Seaton, N. Yates, J. Ledgerwood, Kenneth H. Mayer, Marina Caskey, M. Nussenzweig, Kathryn E. Stephenson, B. Julg, D. Barouch, M. Sobieszczyk, Srilatha Edupuganti, Colleen F. Kelley, M. McElrath, H. Gelderblom, Michael N. Pensiero, A. McDermott, L. Gama, R. Koup, Peter B. Gilbert, Myron S Cohen, Lawrence Corey, O. Hyrien, Georgia D. Tomaras, Yunda Huang
Year: 2024
Publication Date: 2024-04-27
Venue: Pharmaceutics
DOI: 10.3390/pharmaceutics16050594
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Monoclonal antibodies are commonly engineered with an introduction of Met428Leu and Asn434Ser, known as the LS mutation, in the fragment crystallizable region to improve pharmacokinetic profiles. The LS mutation delays antibody clearance by enhancing binding affinity to the neonatal fragment crystallizable receptor found on endothelial cells. To characterize the LS mutation for monoclonal antibodies targeting HIV, we compared pharmacokinetic parameters between parental versus LS variants for five pairs of anti-HIV immunoglobin G1 monoclonal antibodies (VRC01/LS/VRC07-523LS, 3BNC117/LS, PGDM1400/LS PGT121/LS, 10-1074/LS), analyzing data from 16 clinical trials of 583 participants without HIV. We described serum concentrations of these monoclonal antibodies following intravenous or subcutaneous administration by an open two-compartment disposition, with first-order elimination from the central compartment using non-linear mixed effects pharmacokinetic models. We compared estimated pharmacokinetic parameters using the targeted maximum likelihood estimation method, accounting for participant differences. We observed lower clearance rate, central volume, and peripheral volume of distribution for all LS variants compared to parental monoclonal antibodies. LS monoclonal antibodies showed several improvements in pharmacokinetic parameters, including increases in the elimination half-life by 2.7- to 4.1-fold, the dose-normalized area-under-the-curve by 4.1- to 9.5-fold, and the predicted concentration at 4 weeks post-administration by 3.4- to 7.6-fold. Results suggest a favorable pharmacokinetic profile of LS variants regardless of HIV epitope specificity. Insights support lower dosages and/or less frequent dosing of LS variants to achieve similar levels of antibody exposure in future clinical applications.

2024-04-26 — Pseudo-observations and super learner for the estimation of the restricted mean survival time

Authors: Ariane Cwiling, Vittorio Perduca, Olivier Bouaziz
Year: 2024
Publication Date: 2024-04-26
Venue: Lifetime Data Analysis
DOI: 10.1007/s10985-025-09668-9
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
In the context of right-censored data, we study the problem of predicting the restricted time to event based on a set of covariates. Under a quadratic loss, this problem is equivalent to estimating the conditional restricted mean survival time (RMST). To that aim, we propose a flexible and easy-to-use ensemble algorithm that combines pseudo-observations and super learner. The classical theoretical results of the super learner are extended to right-censored data, using a new definition of pseudo-observations, the so-called split pseudo-observations. Simulation studies indicate that the split pseudo-observations and the standard pseudo-observations are similar even for small sample sizes. The method is applied to maintenance and colon cancer datasets, showing the interest of the method in practice, as compared to other prediction methods. We complement the predictions obtained from our method with our RMST-adapted risk measure, prediction intervals and variable importance measures developed in a previous work.

2024-04-16 — Optimizing cardiovascular disease mortality prediction: a super learner approach in the tehran lipid and glucose study

Authors: P. Darabi, S. Gharibzadeh, D. Khalili, Mehrdad Bagherpour-Kalo, Leila Janani
Year: 2024
Publication Date: 2024-04-16
Venue: BMC Medical Informatics and Decision Making
DOI: 10.1186/s12911-024-02489-0
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Cardiovascular disease (CVD) is the most important cause of death in the world and has a potential impact on health care costs, this study aimed to evaluate the performance of machine learning survival models and determine the optimum model for predicting CVD-related mortality. In this study, the research population was all participants in Tehran Lipid and Glucose Study (TLGS) aged over 30 years. We used the Gradient Boosting model (GBM), Support Vector Machine (SVM), Super Learner (SL), and Cox proportional hazard (Cox-PH) models to predict the CVD-related mortality using 26 features. The dataset was randomly divided into training (80%) and testing (20%). To evaluate the performance of the methods, we used the Brier Score (BS), Prediction Error (PE), Concordance Index (C-index), and time-dependent Area Under the Curve (TD-AUC) criteria. Four different clinical models were also performed to improve the performance of the methods. Out of 9258 participants with a mean age of (SD; range) 43.74 (15.51; 20–91), 56.60% were female. The CVD death proportion was 2.5% (228 participants). The death proportion was significantly higher in men (67.98% M, 32.02% F). Based on predefined selection criteria, the SL method has the best performance in predicting CVD-related mortality (TD-AUC > 93.50%). Among the machine learning (ML) methods, The SVM has the worst performance (TD-AUC = 90.13%). According to the relative effect, age, fasting blood sugar, systolic blood pressure, smoking, taking aspirin, diastolic blood pressure, Type 2 diabetes mellitus, hip circumference, body mss index (BMI), and triglyceride were identified as the most influential variables in predicting CVD-related mortality. According to the results of our study, compared to the Cox-PH model, Machine Learning models showed promising and sometimes better performance in predicting CVD-related mortality. This finding is based on the analysis of a large and diverse urban population from Tehran, Iran.

2024-04-15 — Predicting the risks of kidney failure and death in adults with moderate to severe chronic kidney disease: multinational, longitudinal, population based, cohort study

Authors: Ping Liu, Simon Sawhney, U. Heide-Jørgensen, Robert R. Quinn, S. Jensen, Andrew Mclean, C. Christiansen, T. Gerds, Pietro Ravani
Year: 2024
Publication Date: 2024-04-15
Venue: British medical journal
DOI: 10.1136/bmj-2023-078063
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract Objective To train and test a super learner strategy for risk prediction of kidney failure and mortality in people with incident moderate to severe chronic kidney disease (stage G3b to G4). Design Multinational, longitudinal, population based, cohort study. Settings Linked population health data from Canada (training and temporal testing), and Denmark and Scotland (geographical testing). Participants People with newly recorded chronic kidney disease at stage G3b-G4, estimated glomerular filtration rate (eGFR) 15-44 mL/min/1.73 m2. Modelling The super learner algorithm selected the best performing regression models or machine learning algorithms (learners) based on their ability to predict kidney failure and mortality with minimised cross-validated prediction error (Brier score, the lower the better). Prespecified learners included age, sex, eGFR, albuminuria, with or without diabetes, and cardiovascular disease. The index of prediction accuracy, a measure of calibration and discrimination calculated from the Brier score (the higher the better) was used to compare KDpredict with the benchmark, kidney failure risk equation, which does not account for the competing risk of death, and to evaluate the performance of KDpredict mortality models. Results 67 942 Canadians, 17 528 Danish, and 7740 Scottish residents with chronic kidney disease at stage G3b to G4 were included (median age 77-80 years; median eGFR 39 mL/min/1.73 m2). Median follow-up times were five to six years in all cohorts. Rates were 0.8-1.1 per 100 person years for kidney failure and 10-12 per 100 person years for death. KDpredict was more accurate than kidney failure risk equation in prediction of kidney failure risk: five year index of prediction accuracy 27.8% (95% confidence interval 25.2% to 30.6%) versus 18.1% (15.7% to 20.4%) in Denmark and 30.5% (27.8% to 33.5%) versus 14.2% (12.0% to 16.5%) in Scotland. Predictions from kidney failure risk equation and KDpredict differed substantially, potentially leading to diverging treatment decisions. An 80-year-old man with an eGFR of 30 mL/min/1.73 m2 and an albumin-to-creatinine ratio of 100 mg/g (11 mg/mmol) would receive a five year kidney failure risk prediction of 10% from kidney failure risk equation (above the current nephrology referral threshold of 5%). The same man would receive five year risk predictions of 2% for kidney failure and 57% for mortality from KDpredict. Individual risk predictions from KDpredict with four or six variables were accurate for both outcomes. The KDpredict models retrained using older data provided accurate predictions when tested in temporally distinct, more recent data. Conclusions KDpredict could be incorporated into electronic medical records or accessed online to accurately predict the risks of kidney failure and death in people with moderate to severe CKD. The KDpredict learning strategy is designed to be adapted to local needs and regularly revised over time to account for changes in the underlying health system and care processes.

2024-04-15 — Predicting the risks of kidney failure and death in adults with moderate to severe chronic kidney disease

Authors: T. Gerds, Pietro Ravani
Year: 2024
Publication Date: 2024-04-15
Venue: British medical journal
DOI: 10.1136/bmj.q721
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
The risk prediction model for kidney failure and death in people with chronic kidney disease (CKD) presented in the linked study is a super learner. A super learner is an algorithm that repeatedly splits the data into training and test sets and then chooses the best performing model from a list of candidate prediction models. This article describes why and how the super learner was implemented in the linked study.

2024-04-12 — Shifting the paradigm: Estimating heterogeneous treatment effects in the development of walkable cities design

Authors: Jie Zhu, Bojing Liao
Year: 2024
Publication Date: 2024-04-12
Venue: Environment and Planning B Urban Analytics and City Science
DOI: 10.1177/23998083251337810
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Urban transformations driven by population growth and aging have significantly impacted public health and well-being. Estimating the heterogeneous effects of urban design interventions remains challenging, as traditional methods—such as questionnaires and stated preferences—suffer from recall bias and oversimplify interactions between environmental attributes and individual characteristics. This study addresses these limitations by integrating Virtual Reality (VR) with Targeted Maximum Likelihood Estimation (TMLE) to generate robust causal estimates. VR provides an immersive, controlled platform for capturing perceptual and experiential responses, while TMLE mitigates selection bias and estimates Conditional Average Treatment Effects (CATE) across demographic subgroups. Our findings reveal heterogeneous impacts of key urban design attributes—land use mix, block connectivity, road size, open space, and greenery—on perceived walkability and emotional responses. Open space and greenery consistently produced positive effects, while interaction effects between attributes highlighted the need for context-sensitive planning. By applying TMLE to VR-based conjoint experiments, this study advances-built environment research and provides actionable insights for public health policy, emphasizing the importance of personalized urban design strategies that foster equitable, health-supportive environments.

2024-04-09 — Discovering Subgroups of Children With High Mortality in Urban Guinea-Bissau: Exploratory and Validation Cohort Study

Authors: A. Rieckmann, S. Nielsen, P. Dworzynski, H. Amini, S. W. Mogensen, Isaquel Silva, Angela Y. Chang, Oyebuchi A Arah, Wojciech Samek, N. H. Rod, C. T. Ekstrøm, C. Benn, P. Aaby, A. Fisker
Year: 2024
Publication Date: 2024-04-09
Venue: JMIR Public Health and Surveillance
DOI: 10.2196/48060
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Background The decline in global child mortality is an important public health achievement, yet child mortality remains disproportionally high in many low-income countries like Guinea-Bissau. The persisting high mortality rates necessitate targeted research to identify vulnerable subgroups of children and formulate effective interventions. Objective This study aimed to discover subgroups of children at an elevated risk of mortality in the urban setting of Bissau, Guinea-Bissau, West Africa. By identifying these groups, we intend to provide a foundation for developing targeted health interventions and inform public health policy. Methods We used data from the health and demographic surveillance site, Bandim Health Project, covering 2003 to 2019. We identified baseline variables recorded before children reached the age of 6 weeks. The focus was on determining factors consistently linked with increased mortality up to the age of 3 years. Our multifaceted methodological approach incorporated spatial analysis for visualizing geographical variations in mortality risk, causally adjusted regression analysis to single out specific risk factors, and machine learning techniques for identifying clusters of multifactorial risk factors. To ensure robustness and validity, we divided the data set temporally, assessing the persistence of identified subgroups over different periods. The reassessment of mortality risk used the targeted maximum likelihood estimation (TMLE) method to achieve more robust causal modeling. Results We analyzed data from 21,005 children. The mortality risk (6 weeks to 3 years of age) was 5.2% (95% CI 4.8%-5.6%) for children born between 2003 and 2011, and 2.9% (95% CI 2.5%-3.3%) for children born between 2012 and 2016. Our findings revealed 3 distinct high-risk subgroups with notably higher mortality rates, children residing in a specific urban area (adjusted mortality risk difference of 3.4%, 95% CI 0.3%-6.5%), children born to mothers with no prenatal consultations (adjusted mortality risk difference of 5.8%, 95% CI 2.6%-8.9%), and children from polygamous families born during the dry season (adjusted mortality risk difference of 1.7%, 95% CI 0.4%-2.9%). These subgroups, though small, showed a consistent pattern of higher mortality risk over time. Common social and economic factors were linked to a larger share of the total child deaths. Conclusions The study’s results underscore the need for targeted interventions to address the specific risks faced by these identified high-risk subgroups. These interventions should be designed to work to complement broader public health strategies, creating a comprehensive approach to reducing child mortality. We suggest future research that focuses on developing, testing, and comparing targeted intervention strategies unraveling the proposed hypotheses found in this study. The ultimate aim is to optimize health outcomes for all children in high-mortality settings, leveraging a strategic mix of targeted and general health interventions to address the varied needs of different child subgroups.

2024-04-05 — Longitudinal Targeted Minimum Loss-based Estimation with Temporal-Difference Heterogeneous Transformer

Authors: Toru Shirakawa, Yi Li, Yulun Wu, Sky Qiu, Yuxuan Li, Mingduo Zhao, Hiroyasu Iso, M. V. D. Laan
Year: 2024
Publication Date: 2024-04-05
Venue: International Conference on Machine Learning
DOI: 10.48550/arXiv.2404.04399
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation, tmle

Abstract:
We propose Deep Longitudinal Targeted Minimum Loss-based Estimation (Deep LTMLE), a novel approach to estimate the mean of counterfactual outcome under dynamic treatment policies in longitudinal problem settings. Our approach utilizes a transformer architecture with heterogeneous type embedding trained using temporal-difference learning. After obtaining an initial estimate using the transformer, following the targeted minimum loss-based likelihood estimation (TMLE) framework, we statistically corrected for the bias commonly associated with machine learning algorithms. Furthermore, our method also facilitates statistical inference by enabling the provision of 95% confidence intervals grounded in asymptotic statistical theory. Simulation results demonstrate our method's superior performance over existing approaches, particularly in complex, long time-horizon scenarios. It remains effective in small-sample, short-duration contexts, matching the performance of asymptotically efficient estimators. To demonstrate our method in practice, we applied our method to estimate counterfactual mean outcomes for standard versus intensive blood pressure management strategies in a real-world cardiovascular epidemiology cohort study.

2024-04-02 — Nonparametric efficient causal estimation of the intervention-specific expected number of recurrent events with continuous-time targeted maximum likelihood and highly adaptive lasso estimation

Authors: H. Rytgaard, M. J. Laan
Year: 2024
Publication Date: 2024-04-02
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso, targeted maximum likelihood estimation

Abstract:
Longitudinal settings involving outcome, competing risks and censoring events occurring and recurring in continuous time are common in medical research, but are often analyzed with methods that do not allow for taking post-baseline information into account. In this work, we define statistical and causal target parameters via the g-computation formula by carrying out interventions directly on the product integral representing the observed data distribution in a continuous-time counting process model framework. In recurrent events settings our target parameter identifies the expected number of recurrent events also in settings where the censoring mechanism or post-baseline treatment decisions depend on past information of post-baseline covariates such as the recurrent event process. We propose a flexible estimation procedure based on targeted maximum likelihood estimation coupled with highly adaptive lasso estimation to provide a novel approach for double robust and nonparametric inference for the considered target parameter. We illustrate the methods in a simulation study.

2024-04-01 — Influence of Temperature and Precipitation on the Effectiveness of Water, Sanitation, and Handwashing Interventions against Childhood Diarrheal Disease in Rural Bangladesh: A Reanalysis of the WASH Benefits Bangladesh Trial

Authors: A. Nguyen, J. Grembi, Marie Riviere, Gabriella Barratt Heitmann, William D. Hutson, T. Athni, Arusha Patil, A. Ercumen, A. Lin, Y. Crider, Andrew N. Mertens, L. Unicomb, Mahbubur Rahman, Stephen P Luby, Benjamin F. Arnold, J. Benjamin-Chung
Year: 2024
Publication Date: 2024-04-01
Venue: Environmental Health Perspectives
DOI: 10.1289/EHP13807
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Background: Diarrheal disease is a leading cause of childhood morbidity and mortality globally. Household water, sanitation, and handwashing (WASH) interventions can reduce exposure to diarrhea-causing pathogens, but meteorological factors may impact their effectiveness. Information about effect heterogeneity under different weather conditions is critical to refining these targeted interventions. Objectives: We aimed to determine whether temperature and precipitation modified the effect of low-cost, point-of-use WASH interventions on child diarrhea. Methods: We analyzed data from a trial in rural Bangladesh that compared child diarrhea prevalence between clusters (N=720) that were randomized to different WASH interventions between 2012 and 2016 (NCT01590095). We matched temperature and precipitation measurements to diarrhea outcomes (N=12,440 measurements, 6,921 children) by geographic coordinates and date. We estimated prevalence ratios (PRs) using generative additive models and targeted maximum likelihood estimation to assess the effectiveness of each WASH intervention under different weather conditions. Results: Generally, WASH interventions most effectively prevented diarrhea during monsoon season, particularly following weeks with heavy rain or high temperatures. The PR for diarrhea in the WASH interventions group compared with the control group was 0.49 (95% CI: 0.35, 0.68) after 1 d of heavy rainfall, with a less-protective effect [PR=0.87 (95% CI: 0.60, 1.25)] when there were no days with heavy rainfall. Similarly, the PR for diarrhea in the WASH intervention group compared with the control group was 0.60 (95% CI: 0.48, 0.75) following above-median temperatures vs. 0.91 (95% CI: 0.61, 1.35) following below-median temperatures. The influence of precipitation and temperature varied by intervention type; for precipitation, the largest differences in effectiveness were for the sanitation and combined WASH interventions. Discussion: WASH intervention effectiveness was strongly influenced by precipitation and temperature, and nearly all protective effects were observed during the rainy season. Future implementation of these interventions should consider local environmental conditions to maximize effectiveness, including targeted efforts to maintain latrines and promote community adoption ahead of monsoon seasons. https://doi.org/10.1289/EHP13807

2024-03-27 — Predicting harmful alcohol use prevalence in Sub-Saharan Africa between 2015 and 2019: Evidence from population-based HIV impact assessment

Authors: M. Goma, W. Ng’ambi, Cosmas Zyambo
Year: 2024
Publication Date: 2024-03-27
Venue: medRxiv
DOI: 10.1371/journal.pone.0301735
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Introduction: Harmful alcohol use is associated with significant risks to public health outcomes worldwide. Although data on harmful alcohol use have been collected by population-based HIV Impact Assessment (PHIA), there is a dearth of analysis on the effect of HIV/ART status on harmful alcohol use in the SSA countries with PHIA surveys. This study uses data from the national representative PHIA to predict the harmful alcohol use prevalence. Methods: A secondary analysis of the PHIA surveys: Namibia (n=27,382), Tanzania (n=1807), Zambia (n=2268), Zimbabwe (n=3418), Malawi (n=2098), Namibia (n=27,382), and Eswatini (n=2762). Using R version 4.2, the outcome variable and the descriptive variables were tested for association using chi square. Multivariable logistic regression analysis was used identify significant variables associated with harmful alcohol use. We employed to test and apply machine learning (ML) methods through Super Learner, Decision Tree, Random Forest (RF), Lasso Regression, Sample mean and Gradient boosting. Evaluation metrics methods specifically confusion matrix, accuracy, precision, recall, F1 score, and Area under the Receiver Operating Characteristics (AUROC) were used to evaluate the performance of predictive models. The cutoff point for statistically significant was P<0.05. Results: Of the 12,460 persons, 15% used alcohol harmfully. Harmful alcohol use varied by countries and ranged from 8.7% in Malawi to 26.1% in Namibia (P<0.001). Females were less likely to use alcohol in a harmful way (AOR = 0.32, 95% CI: 0.29-0.35, P< 0.001). Compared to those HIV negative, persons that were with HIV-positive and on ART were less likely to use alcohol in a harmful way (AOR = 0.65, 95% CI: 0.57-0.73, P<0.001) however persons that were HIV-positive and not on ART were more likely to use alcohol in a harmful way (OR = 1.49, 95% CI: 1.32-1.69, P<0.001). Being married or formally married was protective to harmful use of alcohol. The best performing models were Lasso or Super Learner or Random Forest were the best performing models while gradient boosting models or sample mean did not perform well. Conclusion: The findings highlight concerning variations in harmful alcohol use prevalence across surveyed countries, with Namibia reporting the highest rate. Males, older individuals, those HIV positive and not yet on ART, and unmarried persons demonstrated a higher likelihood of engaging in harmful alcohol use. These findings collectively contribute to a comprehensive understanding of the multiple factors influencing harmful alcohol use within the surveyed populations, the importance of targeted interventions at country and individual levels.

2024-03-27 — Heterogeneous Association of Tooth Loss with Functional Limitations

Authors: Y. Matsuyama, J. Aida, K. Kondo, K. Shiba
Year: 2024
Publication Date: 2024-03-27
Venue: Journal of dentistry research
DOI: 10.1177/00220345241226957
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Tooth loss is prevalent in older adults and associated with functional capacity decline. Studies on the susceptibility of some individuals to the effects of tooth loss are lacking. This study aimed to investigate the heterogeneity of the association between tooth loss and higher-level functional capacity in older Japanese individuals employing a machine learning approach. This is a prospective cohort study using the data of adults aged ≥65 y in Japan (N = 16,553). Higher-level functional capacity, comprising instrumental independence, intellectual activity, and social role, was evaluated using the Tokyo Metropolitan Institute of Gerontology Index of Competence (TMIG-IC). The scale ranged from 0 (lowest function) to 13 (highest function). Doubly robust targeted maximum likelihood estimation was used to estimate the population-average association between tooth loss (having <20 natural teeth) and TMIG-IC total score after 6 y. The heterogeneity of the association was evaluated by estimating conditional average treatment effects (CATEs) using the causal forest algorithm. The result showed that tooth loss was statistically significantly associated with lower TMIG-IC total scores (population-average effect: −0.14; 95% confidence interval, −0.18 to −0.09). The causal forest analysis revealed the heterogeneous associations between tooth loss and lower TMIG-IC total score after 6 y (median of estimated CATEs = −0.13; interquartile range = 0.12). The high-impact subgroup (i.e., individuals with estimated CATEs of the bottom 10%) were significantly more likely to be older and male, had a lower socioeconomic status, did not have a partner, and had poor health conditions compared with the low-impact subgroup (i.e., individuals with estimated CATEs of the top 10%). This study found that heterogeneity exists in the association between tooth loss and lower scores on functional capacity. Implementing tooth loss prevention policy and clinical measures, especially among vulnerable subpopulations significantly affected by tooth loss, may reduce its burden more effectively.

2024-03-27 — Association of Premorbid GLP-1RA and SGLT-2i Prescription Alone and in Combination with COVID-19 Severity

Authors: Klara R Klein, T. Abrahamsen, A. Kahkoska, G. C. Alexander, C. Chute, Melissa A. Haendel, Stephanie S Hong, Hemalkumar B Mehta, Richard Moffitt, Til Stürmer, Kajsa Kvist, J. Buse, Novo Nordisk, Søborg As, Denmark A. R. Kahkoska
Year: 2024
Publication Date: 2024-03-27
Venue: Diabetes Therapy
DOI: 10.1007/s13300-024-01562-1
Link: Semantic Scholar
Matched Keywords: super learner, targeted maximum likelihood estimation, tmle

Abstract:
People with type 2 diabetes are at heightened risk for severe outcomes related to COVID-19 infection, including hospitalization, intensive care unit admission, and mortality. This study was designed to examine the impact of premorbid use of glucagon-like peptide-1 receptor agonist (GLP-1RA) monotherapy, sodium-glucose cotransporter-2 inhibitor (SGLT-2i) monotherapy, and concomitant GLP1-RA/SGLT-2i therapy on the severity of outcomes in individuals with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. Utilizing observational data from the National COVID Cohort Collaborative through September 2022, we compared outcomes in 78,806 individuals with a prescription of GLP-1RA and SGLT-2i versus a prescription of dipeptidyl peptidase 4 inhibitors (DPP-4i) within 24 months of a positive SARS-CoV-2 PCR test. We also compared concomitant GLP-1RA/SGLT-2i therapy to GLP-1RA and SGLT-2i monotherapy. The primary outcome was 60-day mortality, measured from the positive test date. Secondary outcomes included emergency room (ER) visits, hospitalization, and mechanical ventilation within 14 days. Using a super learner approach and accounting for baseline characteristics, associations were quantified with odds ratios (OR) estimated with targeted maximum likelihood estimation (TMLE). Use of GLP-1RA (OR 0.64, 95% confidence interval [CI] 0.56–0.72) and SGLT-2i (OR 0.62, 95% CI 0.57–0.68) were associated with lower odds of 60-day mortality compared to DPP-4i use. Additionally, the OR of ER visits and hospitalizations were similarly reduced with GLP1-RA and SGLT-2i use. Concomitant GLP-1RA/SGLT-2i use showed similar odds of 60-day mortality when compared to GLP-1RA or SGLT-2i use alone (OR 0.92, 95% CI 0.81–1.05 and OR 0.88, 95% CI 0.76–1.01, respectively). However, lower OR of all secondary outcomes were associated with concomitant GLP-1RA/SGLT-2i use when compared to SGLT-2i use alone. Among adults who tested positive for SARS-CoV-2, premorbid use of either GLP-1RA or SGLT-2i is associated with lower odds of mortality compared to DPP-4i. Furthermore, concomitant use of GLP-1RA and SGLT-2i is linked to lower odds of other severe COVID-19 outcomes, including ER visits, hospitalizations, and mechanical ventilation, compared to SGLT-2i use alone. Graphical abstract available for this article.

2024-03-26 — Treatment Heterogeneity of Water, Sanitation, Hygiene, and Nutrition Interventions on Child Growth by Environmental Enteric Dysfunction and Pathogen Status for Young Children in Bangladesh

Authors: Zachary Butzin-Dozier, Yunwen Ji, Jeremy Coyle, I. Malenica, Elizabeth T. Rogawski McQuade, J. Grembi, J. Platts-Mills, E. Houpt, Jay P Graham, Shahjahan Ali, M. Rahman, Mohammad Alauddin, S. L. Famida, S. Akther, Md. Saheen Hossen, P. Mutsuddi, A. Shoab, Mahbubur Rahman, Md Ohedul Islam, Rana Miah, M. Taniuchi, Jie Liu, Sarah T. Alauddin, Christine P. Stewart, S. Luby, J. Colford, Alan E. Hubbard, Andrew N. Mertens, A. Lin
Year: 2024
Publication Date: 2024-03-26
Venue: medRxiv
DOI: 10.1101/2024.03.21.24304684
Link: Semantic Scholar
Matched Keywords: super learner, targeted maximum likelihood estimation

Abstract:
Background: Water, sanitation, hygiene (WSH), nutrition (N), and combined (N+WSH) interventions are often implemented by global health organizations, but WSH interventions may insufficiently reduce pathogen exposure, and nutrition interventions may be modified by environmental enteric dysfunction (EED), a condition of increased intestinal permeability and inflammation. This study investigated the heterogeneity of these treatments' effects based on individual pathogen and EED biomarker status with respect to child linear growth. Methods: We applied cross-validated targeted maximum likelihood estimation and super learner ensemble machine learning to assess the conditional treatment effects in subgroups defined by biomarker and pathogen status. We analyzed treatment (N+WSH, WSH, N, or control) randomly assigned in-utero, child pathogen and EED data at 14 months of age, and child LAZ at 28 months of age. We estimated the difference in mean child length for age Z-score (LAZ) under the treatment rule and the difference in stratified treatment effect (treatment effect difference) comparing children with high versus low pathogen/biomarker status while controlling for baseline covariates. Results: We analyzed data from 1,522 children, who had median LAZ of -1.56. We found that myeloperoxidase (N+WSH treatment effect difference 0.0007 LAZ, WSH treatment effect difference 0.1032 LAZ, N treatment effect difference 0.0037 LAZ) and Campylobacter infection (N+WSH treatment effect difference 0.0011 LAZ, WSH difference 0.0119 LAZ, N difference 0.0255 LAZ) were associated with greater effect of all interventions on growth. In other words, children with high myeloperoxidase or Campylobacter infection experienced a greater impact of the interventions on growth. We found that a treatment rule that assigned the N+WSH (LAZ difference 0.23, 95% CI (0.05, 0.41)) and WSH (LAZ difference 0.17, 95% CI (0.04, 0.30)) interventions based on EED biomarkers and pathogens increased predicted child growth compared to the randomly allocated intervention. Conclusions: These findings indicate that EED biomarker and pathogen status, particularly Campylobacter and myeloperoxidase (a measure of gut inflammation), may be related to impact of N+WSH, WSH, and N interventions on child linear growth.

2024-03-20 — A non-negative spike-and-slab lasso generalized linear stacking prediction modeling method for high-dimensional omics data

Authors: Junjie Shen, Shuo Wang, Yongfei Dong, Hao Sun, Xichao Wang, Zaixiang Tang
Year: 2024
Publication Date: 2024-03-20
Venue: BMC Bioinformatics
DOI: 10.1186/s12859-024-05741-6
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Background High-dimensional omics data are increasingly utilized in clinical and public health research for disease risk prediction. Many previous sparse methods have been proposed that using prior knowledge, e.g., biological group structure information, to guide the model-building process. However, these methods are still based on a single model, offen leading to overconfident inferences and inferior generalization. Results We proposed a novel stacking strategy based on a non-negative spike-and-slab Lasso (nsslasso) generalized linear model (GLM) for disease risk prediction in the context of high-dimensional omics data. Briefly, we used prior biological knowledge to segment omics data into a set of sub-data. Each sub-model was trained separately using the features from the group via a proper base learner. Then, the predictions of sub-models were ensembled by a super learner using nsslasso GLM. The proposed method was compared to several competitors, such as the Lasso, grlasso, and gsslasso, using simulated data and two open-access breast cancer data. As a result, the proposed method showed robustly superior prediction performance to the optimal single-model method in high-noise simulated data and real-world data. Furthermore, compared to the traditional stacking method, the proposed nsslasso stacking method can efficiently handle redundant sub-models and identify important sub-models. Conclusions The proposed nsslasso method demonstrated favorable predictive accuracy, stability, and biological interpretability. Additionally, the proposed method can also be used to detect new biomarkers and key group structures.

2024-03-15 — Assessing the effectiveness of special education services on fifth grade math scores: Using traditional and machine learning methods with ECLS-K data

Authors: Yuxiang Feng
Year: 2024
Publication Date: 2024-03-15
Venue: Applied and Computational Engineering
DOI: 10.54254/2755-2721/45/20241019
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
The Early Childhood Longitudinal Study, Kindergarten Class of 1998-99 (ECLS-K) is a well-known research endeavor in the field of child development. In this research, some special education services are offered to those students who need supplementary support in some aspects. In this paper, our study aims to estimate the average treatment effect on students fifth grade math scores and assesses the effectiveness of these special education services based on the ECLS-K dataset, through both machine learning methods and traditional methods. We introduce Donald Rubins causal model and Propensity Score Analysis in the part of traditional methods, and Ordinary Least Squares (OLS), Targeted Maximum Likelihood Estimation (TMLE), Bayesian Additive Regression Trees (BART), Generalized Random Forests (GRF) and Double Machine Learning (DML) in the part of machine learning methods. Finally, we employ Propensity Score Matching, OLS and BART to estimate the ATE. All estimated ATEs are significantly different from zero. The estimated ATEs are found to be minus, suggesting that these special education services may have a negative effect on students fifth grade math scores. Obviously, this conclusion is inconsistent with the original intent of these services, which aimed to have a positive impact.

2024-03-13 — Gender-specific prolactin thresholds to determine prolactinoma size: a novel Bayesian approach and its clinical utility

Authors: Markus Huber, M. M. Luedi, G. Schubert, C. Musahl, Angelo Tortora, J. Frey, Jürgen Beck, L. Mariani, Emanuel Christ, Lukas Andereggen
Year: 2024
Publication Date: 2024-03-13
Venue: Frontiers in Surgery
DOI: 10.3389/fsurg.2024.1363431
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Background In clinical practice, the size of adenomas is crucial for guiding prolactinoma patients towards the most suitable initial treatment. Consequently, establishing guidelines for serum prolactin level thresholds to assess prolactinoma size is essential. However, the potential impact of gender differences in prolactin levels on estimating adenoma size (micro- vs. macroadenoma) is not yet fully comprehended. Objective To introduce a novel statistical method for deriving gender-specific prolactin thresholds to discriminate between micro- and macroadenomas and to assess their clinical utility. Methods We present a novel, multilevel Bayesian logistic regression approach to compute observationally constrained gender-specific prolactin thresholds in a large cohort of prolactinoma patients (N = 133) with respect to dichotomized adenoma size. The robustness of the approach is examined with an ensemble machine learning approach (a so-called super learner), where the observed differences in prolactin and adenoma size between female and male patients are preserved and the initial sample size is artificially increased tenfold. Results The framework results in a global prolactin threshold of 239.4 μg/L (95% credible interval: 44.0–451.2 μg/L) to discriminate between micro- and macroadenomas. We find evidence of gender-specific prolactin thresholds of 211.6 μg/L (95% credible interval: 29.0–426.2 μg/L) for women and 1,046.1 μg/L (95% credible interval: 582.2–2,325.9 μg/L) for men. Global (that is, gender-independent) thresholds result in a high sensitivity (0.97) and low specificity (0.57) when evaluated among men as most prolactin values are above the global threshold. Applying male-specific thresholds results in a slightly different scenario, with a high specificity (0.99) and moderate sensitivity (0.74). The male-dependent prolactin threshold shows large uncertainty and features some dependency on the choice of priors, in particular for small sample sizes. The augmented datasets demonstrate that future, larger cohorts are likely able to reduce the uncertainty range of the prolactin thresholds. Conclusions The proposed framework represents a significant advancement in patient-centered care for treating prolactinoma patients by introducing gender-specific thresholds. These thresholds enable tailored treatment strategies by distinguishing between micro- and macroadenomas based on gender. Specifically, in men, a negative diagnosis using a universal prolactin threshold can effectively rule out a macroadenoma, while a positive diagnosis using a male-specific prolactin threshold can indicate its presence. However, the clinical utility of a female-specific prolactin threshold in our cohort is limited. This framework can be easily adapted to various biomedical settings with two subgroups having imbalanced average biomarkers and outcomes of interest. Using machine learning techniques to expand the dataset while preserving significant observed imbalances presents a valuable method for assessing the reliability of gender-specific threshold estimates. However, external cohorts are necessary to thoroughly validate our thresholds.

2024-03-11 — Randomized Trial of Dynamic Choice HIV Prevention at Antenatal and Postnatal Care Clinics in Rural Uganda and Kenya

Authors: J. Kabami, Catherine A Koss, Helen Sunday, Edith Biira, Marilyn Nyabuti, L. Balzer, Shalika Gupta, G. Chamie, J. Ayieko, E. Kakande, Melanie C. Bacon, Diane V. Havlir, M. Kamya, Maya Petersen
Year: 2024
Publication Date: 2024-03-11
Venue: Journal of Acquired Immune Deficiency Syndromes
DOI: 10.1097/QAI.0000000000003383
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Background: Pregnant and postpartum women in Sub-Saharan Africa are at high risk of HIV acquisition. We evaluated a person-centered dynamic choice intervention for HIV prevention (DCP) among women attending antenatal and postnatal care. Setting: Rural Kenya and Uganda. Methods: Women (aged 15 years or older) at risk of HIV acquisition seen at antenatal and postnatal care clinics were individually randomized to DCP vs. standard of care (SEARCH; NCT04810650). The DCP intervention included structured client choice of product (daily oral pre-exposure prophylaxis or postexposure prophylaxis), service location (clinic or out of facility), and HIV testing modality (self-test or provider-administered), with option to switch over time and person-centered care (phone access to clinician, structured barrier assessment and counseling, and provider training). The primary outcome was biomedical prevention coverage—proportion of 48-week follow-up with self-reported pre-exposure prophylaxis or postexposure prophylaxis use, compared between arms using targeted maximum likelihood estimation. Results: Between April and July 2021, we enrolled 400 women (203 intervention and 197 control); 38% were pregnant, 52% were aged 15–24 years, and 94% reported no pre-exposure prophylaxis or postexposure prophylaxis use for ≥6 months before baseline. Among 384/400 participants (96%) with outcome ascertained, DCP increased biomedical prevention coverage 40% (95% CI: 34% to 47%; P < 0.001); the coverage was 70% in intervention vs. 29% in control. DCP also increased coverage during months at risk of HIV (81% in intervention, 43% in control; 38% absolute increase; 95% CI: 31% to 45%; P < 0.001). Conclusion: A person-centered dynamic choice intervention that provided flexibility in product, testing, and service location more than doubled biomedical HIV prevention coverage in a high-risk population already routinely offered access to biomedical prevention options.

2024-03-08 — Using the Super Learner algorithm to predict risk of major adverse cardiovascular events after percutaneous coronary intervention in patients with myocardial infarction

Authors: Xiang Zhu, Pin Zhang, Han Jiang, Jie Kuang, Lei Wu
Year: 2024
Publication Date: 2024-03-08
Venue: BMC Medical Research Methodology
DOI: 10.1186/s12874-024-02179-5
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Background The primary treatment for patients with myocardial infarction (MI) is percutaneous coronary intervention (PCI). Despite this, the incidence of major adverse cardiovascular events (MACEs) remains a significant concern. Our study seeks to optimize PCI predictive modeling by employing an ensemble learning approach to identify the most effective combination of predictive variables. Methods and results We conducted a retrospective, non-interventional analysis of MI patient data from 2018 to 2021, focusing on those who underwent PCI. Our principal metric was the occurrence of 1-year postoperative MACEs. Variable selection was performed using lasso regression, and predictive models were developed using the Super Learner (SL) algorithm. Model performance was appraised by the area under the receiver operating characteristic curve (AUC) and the average precision (AP) score. Our cohort included 3,880 PCI patients, with 475 (12.2%) experiencing MACEs within one year. The SL model exhibited superior discriminative performance, achieving a validated AUC of 0.982 and an AP of 0.971, which markedly surpassed the traditional logistic regression models (AUC: 0.826, AP: 0.626) in the test cohort. Thirteen variables were significantly associated with the occurrence of 1-year MACEs. Conclusion Implementing the Super Learner algorithm has substantially enhanced the predictive accuracy for the risk of MACEs in MI patients. This advancement presents a promising tool for clinicians to craft individualized, data-driven interventions to better patient outcomes.

2024-03-02 — Correction to: An adaptive model of optimal traffic flow prediction using adaptive wildfire optimization and spatial pattern super learning

Authors: Rishabh Jain, Sunita Dhingra, Kamaldeep Joshi, Amit Grover
Year: 2024
Publication Date: 2024-03-02
Venue: Wireless networks
DOI: 10.1007/s11276-024-03710-8
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Abstract unavailable from Semantic Scholar.

2024-02-26 — Securing Emerging IoT Environments with Super Learner Ensembles

Authors: Abdelraouf M. Ishtaiwi, Ali Al Maqousi, A. Aldweesh
Year: 2024
Publication Date: 2024-02-26
Venue: International Conference Control and Robots
DOI: 10.1109/ICCR61006.2024.10533002
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
This paper investigates the efficacy of the Super Learner ensemble algorithm for robust anomaly detection in Internet of Things (IoT) network traffic. The recently released CIC IoT Dataset 2023, which contains both normal background traffic and common cyber attack patterns, is utilized as an evaluation benchmark. The Super Learner ensemble integrates five diverse base classifiers - Support Vector Machines (SVMs), logistic regression, neural networks, random forests, and K-Nearest Neighbors (KNNs). Extensive empirical analysis is conducted by training on a large stratified sample and testing generalization performance on a held-out set. Results demonstrate that the heterogeneous Super Learner ensemble achieves 94.2% test accuracy in detecting anomalies, which significantly outperforms any individual base model by over 7 percentage points. Precision, recall, and F1 metrics are also markedly improved by the ensemble approach compared to single-learner solutions.

2024-02-26 — Developing automated machine learning approach for fast and robust crop yield prediction using a fusion of remote sensing, soil, and weather dataset

Authors: A. Kheir, Ajit Govind, Vinay Nangia, M. Devkota, Abdelrazek Elnashar, M. E. D. Omar, T. Feike
Year: 2024
Publication Date: 2024-02-26
Venue: Environmental Research Communications
DOI: 10.1088/2515-7620/ad2d02
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Estimating smallholder crop yields robustly and timely is crucial for improving agronomic practices, determining yield gaps, guiding investment, and policymaking to ensure food security. However, there is poor estimation of yield for most smallholders due to lack of technology, and field scale data, particularly in Egypt. Automated machine learning (AutoML) can be used to automate the machine learning workflow, including automatic training and optimization of multiple models within a user-specified time frame, but it has less attention so far. Here, we combined extensive field survey yield across wheat cultivated area in Egypt with diverse dataset of remote sensing, soil, and weather to predict field-level wheat yield using 22 Ml models in AutoML. The models showed robust accuracies for yield predictions, recording Willmott degree of agreement, (d > 0.80) with higher accuracy when super learner (stacked ensemble) was used (R2 = 0.51, d = 0.82). The trained AutoML was deployed to predict yield using remote sensing (RS) vegetative indices (VIs), demonstrating a good correlation with actual yield (R2 = 0.7). This is very important since it is considered a low-cost tool and could be used to explore early yield predictions. Since climate change has negative impacts on agricultural production and food security with some uncertainties, AutoML was deployed to predict wheat yield under recent climate scenarios from the Coupled Model Intercomparison Project Phase 6 (CMIP6). These scenarios included single downscaled General Circulation Model (GCM) as CanESM5 and two shared socioeconomic pathways (SSPs) as SSP2-4.5and SSP5-8.5during the mid-term period (2050). The stacked ensemble model displayed declines in yield of 21% and 5% under SSP5-8.5 and SSP2-4.5 respectively during mid-century, with higher uncertainty under the highest emission scenario (SSP5-8.5). The developed approach could be used as a rapid, accurate and low-cost method to predict yield for stakeholder farms all over the world where ground data is scarce.

2024-02-18 — Predicting Co-Occurring Mental Health and Substance Use Disorders in Women: An Automated Machine Learning Approach

Authors: Nirmal Acharya, Padmaja Kar, Mustafa A. Ally, Jeffrey Soar
Year: 2024
Publication Date: 2024-02-18
Venue: Applied Sciences
DOI: 10.3390/app14041630
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Significant clinical overlap exists between mental health and substance use disorders, especially among women. The purpose of this research is to leverage an AutoML (Automated Machine Learning) interface to predict and distinguish co-occurring mental health (MH) and substance use disorders (SUD) among women. By employing various modeling algorithms for binary classification, including Random Forest, Gradient Boosted Trees, XGBoost, Extra Trees, SGD, Deep Neural Network, Single-Layer Perceptron, K Nearest Neighbors (grid), and a super learning model (constructed by combining the predictions of a Random Forest model and an XGBoost model), the research aims to provide healthcare practitioners with a powerful tool for earlier identification, intervention, and personalised support for women at risk. The present research presents a machine learning (ML) methodology for more accurately predicting the co-occurrence of mental health (MH) and substance use disorders (SUD) in women, utilising the Treatment Episode Data Set Admissions (TEDS-A) from the year 2020 (n = 497,175). A super learning model was constructed by combining the predictions of a Random Forest model and an XGBoost model. The model demonstrated promising predictive performance in predicting co-occurring MH and SUD in women with an AUC = 0.817, Accuracy = 0.751, Precision = 0.743, Recall = 0.926 and F1 Score = 0.825. The use of accurate prediction models can substantially facilitate the prompt identification and implementation of intervention strategies.

2024-02-06 — SSRI Use During Acute COVID-19 Infection Associated with Lower Risk of Long COVID Among Patients with Depression

Authors: Zachary Butzin-Dozier, Yunwen Ji, Sarang Deshpande, Eric Hurwitz, Jeremy Coyle, Junming Shi, Andrew Mertens, M. V. D. Laan, J. Colford, Rena C Patel, Alan E. Hubbard, -. O. B. O. T. N. C. C. Collaborative
Year: 2024
Publication Date: 2024-02-06
Venue: medRxiv
DOI: 10.1101/2024.02.05.24302352
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Background: Long COVID, also known as post-acute sequelae of COVID-19 (PASC), is a poorly understood condition with symptoms across a range of biological domains that often have debilitating consequences. Some have recently suggested that lingering SARS-CoV-2 virus in the gut may impede serotonin production and that low serotonin may drive many Long COVID symptoms across a range of biological systems. Therefore, selective serotonin reuptake inhibitors (SSRIs), which increase synaptic serotonin availability, may prevent or treat Long COVID. SSRIs are commonly prescribed for depression, therefore restricting a study sample to only include patients with depression can reduce the concern of confounding by indication. Methods: In an observational sample of electronic health records from patients in the National COVID Cohort Collaborative (N3C) with a COVID-19 diagnosis between September 1, 2021, and December 1, 2022, and pre-existing major depressive disorder, the leading indication for SSRI use, we evaluated the relationship between SSRI use at the time of COVID-19 infection and subsequent 12-month risk of Long COVID (defined by ICD-10 code U09.9). We defined SSRI use as a prescription for SSRI medication beginning at least 30 days before COVID-19 infection and not ending before COVID-19 infection. To minimize bias, we estimated the causal associations of interest using a nonparametric approach, targeted maximum likelihood estimation, to aggressively adjust for high-dimensional covariates. Results: We analyzed a sample (n = 506,903) of patients with a diagnosis of major depressive disorder before COVID-19 diagnosis, where 124,928 (25%) were using an SSRI. We found that SSRI users had a significantly lower risk of Long COVID compared to nonusers (adjusted causal relative risk 0.90, 95% CI (0.86, 0.94)). Conclusion: These findings suggest that SSRI use during COVID-19 infection may be protective against Long COVID, supporting the hypothesis that serotonin may be a key mechanistic biomarker of Long COVID.

2024-02-05 — Post-traumatic stress disorder as a risk factor for major adverse cardiovascular events: a cohort study of a South African medical insurance scheme

Authors: C. Mesa-Vieira, C. Didden, Michael Schomaker, J. Mouton, N. Folb, L. L. van den Heuvel, Chiara Gastaldon, M. Cornell, M. Tlali, R. Kassanjee, Oscar H Franco, S. Seedat, A. Haas
Year: 2024
Publication Date: 2024-02-05
Venue: Epidemiology and Psychiatric Sciences
DOI: 10.1017/S2045796024000052
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract Aims Prior research, largely focused on US male veterans, indicates an increased risk of cardiovascular disease among individuals with post-traumatic stress disorder (PTSD). Data from other settings and populations are scarce. The objective of this study is to examine PTSD as a risk factor for incident major adverse cardiovascular events (MACEs) in South Africa. Methods We analysed reimbursement claims (2011–2020) of a cohort of South African medical insurance scheme beneficiaries aged 18 years or older. We calculated adjusted hazard ratios (aHRs) for associations between PTSD and MACEs using Cox proportional hazard models and calculated the effect of PTSD on MACEs using longitudinal targeted maximum likelihood estimation. Results We followed 1,009,113 beneficiaries over a median of 3.0 years (IQR 1.1–6.0). During follow-up, 12,662 (1.3%) persons were diagnosed with PTSD and 39,255 (3.9%) had a MACE. After adjustment for sex, HIV status, age, population group, substance use disorders, psychotic disorders, major depressive disorder, sleep disorders and the use of antipsychotic medication, PTSD was associated with a 16% increase in the risk of MACEs (aHR 1.16, 95% confidence interval (CI) 1.05–1.28). The risk ratio for the effect of PTSD on MACEs decreased from 1.59 (95% CI 1.49–1.68) after 1 year of follow-up to 1.14 (95% CI 1.11–1.16) after 8 years of follow-up. Conclusion Our study provides empirical support for an increased risk of MACEs in males and females with PTSD from a general population sample in South Africa. These findings highlight the importance of monitoring cardiovascular risk among individuals diagnosed with PTSD.

2024-02-03 — An adaptive model of optimal traffic flow prediction using adaptive wildfire optimization and spatial pattern super learning

Authors: Rishabh Jain, Sunita Dhingra, Kamaldeep Joshi, Amit Grover
Year: 2024
Publication Date: 2024-02-03
Venue: Wireless networks
DOI: 10.1007/s11276-023-03609-w
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Abstract unavailable from Semantic Scholar.

2024-02-01 — Performance comparison of machine learning models used for predicting subclinical mastitis in dairy cows: bagging, boosting, stacking and super-learner ensembles versus single machine learning models.

Authors: A. Satoła, K. Satoła
Year: 2024
Publication Date: 2024-02-01
Venue: Journal of Dairy Science
DOI: 10.3168/jds.2023-24243
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Mastitis has a substantial impact on the dairy industry across the world, causing dairy producers to suffer losses due to the reduced quality and quantity of produced milk. A further problem, related to this issue, is the excessive use of antibiotics that leads to the development of resistance in different bacterial strains. The growing consumer awareness oriented toward food safety and rational use of antibiotics has promoted the search for new methods of early identification of cows that may be at risk of developing the disease. Subclinical mastitis does not cause any visible changes to the udder or milk, and therefore it is more difficult to detect than clinical mastitis. The collection of large amounts of data related to milk performance of cows allows using machine learning (ML) methods to build models that could be used for classifying cows into healthy and at risk of subclinical mastitis. The data used for the purpose of this study included information from routine milk recording procedures. The data set consisted of 19,856 records of 2,227 Polish Holstein-Friesian cows from 3 herds. The authors decided to use the approach of building ensemble ML models, in particular bagging, boosting, stacking and super learner models, and comparing them for accuracy of identification of disease-affected cows against single ML models based on the Support Vector Machines, Logistic Regression, Gaussian Naïve Bayes, k-Nearest Neighbors and Decision Tree algorithms. The models were trained and evaluated based on the information recorded for herd 1 and using an 80:20 train-test split ratio according to animal ID (to avoid data leakage). The information recorded for herds 2 and 3 was only used to evaluate on unseen data models developed using the herd 1 data set. Among the single ML models, the Support Vector Machines model was found to be the most accurate in predicting subclinical mastitis at subsequent test-day when used both for the training set (mean F1-score of 0.760) and the testing sets containing data for herds 1, 2 and 3 (F1-score of 0.778, 0.790 and 0.741 respectively). The Gradient Boosting model was found to be the best performing model among the ensemble ML models (F1-score of 0.762, 0.779, 0.791 and 0.723 for the training set and the testing sets respectively). The super learner model, featuring the most advanced design and Logistic Regression in the meta layer, achieved the highest mean F1-score of 0.775 during the cross-validation, however, it was characterized by a slightly worse prediction accuracy of the testing sets (mean F1-score of 0.768, 0.790 and 0.693 for herds 1, 2 and 3 respectively). The study findings confirm the promising role of ensemble ML methods that were found to be slightly superior with respect to most of the single ML models.

2024-01-29 — Adaptive sequential surveillance with network and temporal dependence.

Authors: I. Malenica, Jeremy R Coyle, M. J. van der Laan, Maya L Petersen
Year: 2024
Publication Date: 2024-01-29
Venue: Biometrics
DOI: 10.1093/biomtc/ujad007
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Strategic test allocation is important for control of both emerging and existing pandemics (eg, COVID-19, HIV). It supports effective epidemic control by (1) reducing transmission via identifying cases and (2) tracking outbreak dynamics to inform targeted interventions. However, infectious disease surveillance presents unique statistical challenges. For instance, the true outcome of interest (positive infection status) is often a latent variable. In addition, presence of both network and temporal dependence reduces data to a single observation. In this work, we study an adaptive sequential design, which allows for unspecified dependence among individuals and across time. Our causal parameter is the mean latent outcome we would have obtained, if, starting at time t given the observed past, we had carried out a stochastic intervention that maximizes the outcome under a resource constraint. The key strength of the method is that we do not have to model network and time dependence: a short-term performance Online Super Learner is used to select among dependence models and randomization schemes. The proposed strategy learns the optimal choice of testing over time while adapting to the current state of the outbreak and learning across samples, through time, or both. We demonstrate the superior performance of the proposed strategy in an agent-based simulation modeling a residential university environment during the COVID-19 pandemic.

2024-01-26 — Accelerating Elastic Property Prediction in Fe-C Alloys through Coupling of Molecular Dynamics and Machine Learning

Authors: Sandesh Risal, Navdeep Singh, Yan Yao, Li Sun, S. Risal, Weihang Zhu
Year: 2024
Publication Date: 2024-01-26
Venue: Materials
DOI: 10.3390/ma17030601
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
The scarcity of high-quality data presents a major challenge to the prediction of material properties using machine learning (ML) models. Obtaining material property data from experiments is economically cost-prohibitive, if not impossible. In this work, we address this challenge by generating an extensive material property dataset comprising thousands of data points pertaining to the elastic properties of Fe-C alloys. The data were generated using molecular dynamic (MD) calculations utilizing reference-free Modified embedded atom method (RF-MEAM) interatomic potential. This potential was developed by fitting atomic structure-dependent energies, forces, and stress tensors evaluated at ground state and finite temperatures using ab-initio. Various ML algorithms were subsequently trained and deployed to predict elastic properties. In addition to individual algorithms, super learner (SL), an ensemble ML technique, was incorporated to refine predictions further. The input parameters comprised the alloy’s composition, crystal structure, interstitial sites, lattice parameters, and temperature. The target properties were the bulk modulus and shear modulus. Two distinct prediction approaches were undertaken: employing individual models for each property prediction and simultaneously predicting both properties using a single integrated model, enabling a comparative analysis. The efficiency of these models was assessed through rigorous evaluation using a range of accuracy metrics. This work showcases the synergistic power of MD simulations and ML techniques for accelerating the prediction of elastic properties in alloys.

2024-01-25 — Antibody selection strategies and their impact in predicting clinical malaria based on multi-sera data

Authors: André Fonseca, Mikolaj Spytek, P. Biecek, C. Cordeiro, Nuno Sepúlveda
Year: 2024
Publication Date: 2024-01-25
Venue: BioData Mining
DOI: 10.1186/s13040-024-00354-4
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Background Nowadays, the chance of discovering the best antibody candidates for predicting clinical malaria has notably increased due to the availability of multi-sera data. The analysis of these data is typically divided into a feature selection phase followed by a predictive one where several models are constructed for predicting the outcome of interest. A key question in the analysis is to determine which antibodies should be included in the predictive stage and whether they should be included in the original or a transformed scale (i.e. binary/dichotomized). Methods To answer this question, we developed three approaches for antibody selection in the context of predicting clinical malaria: (i) a basic and simple approach based on selecting antibodies via the nonparametric Mann–Whitney-Wilcoxon test; (ii) an optimal dychotomizationdichotomization approach where each antibody was selected according to the optimal cut-off via maximization of the chi-squared (χ^2) statistic for two-way tables; (iii) a hybrid parametric/non-parametric approach that integrates Box-Cox transformation followed by a t-test, together with the use of finite mixture models and the Mann–Whitney-Wilcoxon test as a last resort. We illustrated the application of these three approaches with published serological data of 36 Plasmodium falciparum antigens for predicting clinical malaria in 121 Kenyan children. The predictive analysis was based on a Super Learner where predictions from multiple classifiers including the Random Forest were pooled together. Results Our results led to almost similar areas under the Receiver Operating Characteristic curves of 0.72 (95% CI = [0.62, 0.82]), 0.80 (95% CI = [0.71, 0.89]), 0.79 (95% CI = [0.7, 0.88]) for the simple, dichotomization and hybrid approaches, respectively. These approaches were based on 6, 20, and 16 antibodies, respectively. Conclusions The three feature selection strategies provided a better predictive performance of the outcome when compared to the previous results relying on Random Forest including all the 36 antibodies (AUC = 0.68, 95% CI = [0.57;0.79]). Given the similar predictive performance, we recommended that the three strategies should be used in conjunction in the same data set and selected according to their complexity.

2024-01-08 — Two-Step Targeted Minimum-Loss Based Estimation for Non-Negative Two-Part Outcomes

Authors: Nicholas T Williams, Richard Liu, Katherine L. Hoffman, Sarah Forrest, Kara E. Rudolph, Iv'an D'iaz
Year: 2024
Publication Date: 2024-01-08
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Non-negative two-part outcomes are defined as outcomes with a density function that have a zero point mass but are otherwise positive. Examples, such as healthcare expenditure and hospital length of stay, are common in healthcare utilization research. Despite the practical relevance of non-negative two-part outcomes, very few methods exist to leverage knowledge of their semicontinuity to achieve improved performance in estimating causal effects. In this paper, we develop a nonparametric two-step targeted minimum-loss based estimator (denoted as hTMLE) for non-negative two-part outcomes. We present methods for a general class of interventions referred to as modified treatment policies, which can accommodate continuous, categorical, and binary exposures. The two-step TMLE uses a targeted estimate of the intensity component of the outcome to produce a targeted estimate of the binary component of the outcome that may improve finite sample efficiency. We demonstrate the efficiency gains achieved by the two-step TMLE with simulated examples and then apply it to a cohort of Medicaid beneficiaries to estimate the effect of chronic pain and physical disability on days' supply of opioids.

2024-01-02 — The role of psychosocial well-being and emotion-driven impulsiveness in food choices of European adolescents

Authors: Stefanie Do, Vanessa Didelez, C. Börnhorst, J. Coumans, L. Reisch, U. Danner, P. Russo, T. Veidebaum, M. Tornaritis, Dénes Molnár, M. Hunsberger, S. de Henauw, Luis A. Moreno, W. Ahrens, A. Hebestreit
Year: 2024
Publication Date: 2024-01-02
Venue: International Journal of Behavioral Nutrition and Physical Activity
DOI: 10.1186/s12966-023-01551-w
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Background It is unclear whether a hypothetical intervention targeting either psychosocial well-being or emotion-driven impulsiveness is more effective in reducing unhealthy food choices. Therefore, we aimed to compare the (separate) causal effects of psychosocial well-being and emotion-driven impulsiveness on European adolescents’ sweet and fat propensity. Methods We included 2,065 participants of the IDEFICS/I.Family cohort (mean age: 13.4) providing self-reported data on sweet propensity (score range: 0 to 68.4), fat propensity (range: 0 to 72.6), emotion-driven impulsiveness using the UPPS-P negative urgency subscale, and psychosocial well-being using the KINDL^R Questionnaire. We estimated, separately, the average causal effects of psychosocial well-being and emotion-driven impulsiveness on sweet and fat propensity applying a semi-parametric doubly robust method (targeted maximum likelihood estimation). Further, we investigated a potential indirect effect of psychosocial well-being on sweet and fat propensity mediated via emotion-driven impulsiveness using a causal mediation analysis. Results If all adolescents, hypothetically, had high levels of psychosocial well-being, compared to low levels, we estimated a decrease in average sweet propensity by 1.43 [95%-confidence interval: 0.25 to 2.61]. A smaller effect was estimated for fat propensity. Similarly, if all adolescents had high levels of emotion-driven impulsiveness, compared to low levels, average sweet propensity would be decreased by 2.07 [0.87 to 3.26] and average fat propensity by 1.85 [0.81 to 2.88]. The indirect effect of psychosocial well-being via emotion-driven impulsiveness was 0.61 [0.24 to 1.09] for average sweet propensity and 0.55 [0.13 to 0.86] for average fat propensity. Conclusions An intervention targeting emotion-driven impulsiveness, compared to psychosocial well-being, would be marginally more effective in reducing sweet and fat propensity in adolescents.

2024-01-02 — PROVIDENT: Development and Validation of a Machine Learning Model to Predict Neighborhood-level Overdose Risk in Rhode Island

Authors: Bennett Allen, Robert C. Schell, Victoria A. Jent, Maxwell S Krieger, Claire Pratty, Benjamin D Hallowell, W. Goedel, Melissa Basta, Jesse L. Yedinak, Yu Li, Abigail R Cartus, Brandon D. L. Marshall, Magdalena Cerdá, Jennifer Ahern, Daniel B Neill
Year: 2024
Publication Date: 2024-01-02
Venue: Epidemiology
DOI: 10.1097/EDE.0000000000001695
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Background: Drug overdose persists as a leading cause of death in the United States, but resources to address it remain limited. As a result, health authorities must consider where to allocate scarce resources within their jurisdictions. Machine learning offers a strategy to identify areas with increased future overdose risk to proactively allocate overdose prevention resources. This modeling study is embedded in a randomized trial to measure the effect of proactive resource allocation on statewide overdose rates in Rhode Island (RI). Methods: We used statewide data from RI from 2016 to 2020 to develop an ensemble machine learning model predicting neighborhood-level fatal overdose risk. Our ensemble model integrated gradient boosting machine and super learner base models in a moving window framework to make predictions in 6-month intervals. Our performance target, developed a priori with the RI Department of Health, was to identify the 20% of RI neighborhoods containing at least 40% of statewide overdose deaths, including at least one neighborhood per municipality. The model was validated after trial launch. Results: Our model selected priority neighborhoods capturing 40.2% of statewide overdose deaths during the test periods and 44.1% of statewide overdose deaths during validation periods. Our ensemble outperformed the base models during the test periods and performed comparably to the best-performing base model during the validation periods. Conclusions: We demonstrated the capacity for machine learning models to predict neighborhood-level fatal overdose risk to a degree of accuracy suitable for practitioners. Jurisdictions may consider predictive modeling as a tool to guide allocation of scarce resources.

2024 — Predicting flexural-creep stiffness in bending beam rheometer (BBR) experiments using advanced super learner machine learning techniques

Authors: Alireza Roshan, Magdy Abdelrahman
Year: 2024
Venue: Research on Engineering Structures and Materials
DOI: 10.17515/resm2024.58me1027rs
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
BBR test is commonly used to assess the low-temperature performance grade (PG) of asphalt binders, with the flexural-creep stiffness being a critical parameter calculated through this test. However, it has notable limitations that demand attention. The significant amount of asphalt binder needed for test specimens increases costs and resource consumption. Additionally, the complex and time-consuming specimen preparation process hinders testing efficiency and introduces result variability, affecting the accuracy and reliability of PG determinations. In recent years, machine learning (ML) has emerged as a promising substitute for predicting various engineering values. In this study, the primary focus was on harnessing super learner (SL) techniques to predict the creep stiffness of asphalt binders. The SL approach combines multiple ML algorithms to enhance predictive accuracy and reduce individual model biases. Bagging, boosting, and stacking algorithms were employed in the construction of these prediction models. To conduct the investigation, data from 1350 samples sourced from the Long-Term Pavement Performance (LTPP) website were utilized to explore the influence of six crucial variables on the creep stiffness of asphalt binders. The proposed method demonstrated high accuracy, nearing 90% in the coefficient of determination. The Stacking model achieved a low Mean Absolute Percentage Error of 2.86% and robust Prediction Accuracy of 97.14% for randomly selected data points. Furthermore, the sensitivity analysis highlighted the significance of distinct input variables in influencing the creep stiffness of asphalt binders. Notably, the test temperature emerged as the most influential factor affecting creep stiffness, according to the conducted study

2024 — Physician Effects in Critical Care: A Causal Inference Approach Through Propensity Weighting with Parametric and Super Learning Methods

Authors: Yuan Bian, Yu Shi, Hui Guo, Grace Y. Yi, Wenqing He
Year: 2024
Venue: Journal of Data Science
DOI: 10.6339/24-jds1143
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Physician performance is critical to caring for patients admitted to the intensive care unit (ICU), who are in life-threatening situations and require high level medical care and interventions. Evaluating physicians is crucial for ensuring a high standard of medical care and fostering continuous performance improvement. The non-randomized nature of ICU data often results in imbalance in patient covariates across physician groups, making direct comparisons of the patients’ survival probabilities for each physician misleading. In this article, we utilize the propensity weighting method to address confounding, achieve covariates balance, and assess physician effects. Due to possible model misspecification, we compare the performance of the propensity weighting methods using both parametric models and super learning methods. When the generalized propensity or the quality function is not correctly specified within the parametric propensity weighting framework, super learning-based propensity weighting methods yield more efficient estimators. We demonstrate that utilizing propensity weighting offers an effective way to assess physician performance, a topic of considerable interest to hospital administrators.

2024 — Optimal Electricity Load Interruption Based on Time Series Classification With Super Learner and Feature Filtering

Authors: Solomon Oluwole Akinola, Peter O. Olukanmi, Qing-Guo Wang, T. Marwala
Year: 2024
Venue: IEEE Access
DOI: 10.1109/ACCESS.2024.3399390
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Load-shedding is vital for managing electrical power shortages and avoiding grid collapse. However, excessive electricity demand poses an imminent threat to the overall stability of power grid system (PGS) and its ability to run safely and reliably. Load-shedding strategies can be complicated and inadequate to manage electrical power system efficiently. The study proposed a data-driven load-shedding time series classification (TSC) technique employing a heterogeneous ensemble super learner (eSL) to categorize load-shedding based on contributing features. The model investigated challenges with binary classification while using a multidimensional time series for South Africa’s hourly load-shedding stages in MWcollected from PGS data. Considering that load-shedding is planned and predicted based on contributing features, we use these features as strong indicators to classify expected outcomes for load-shedding or no load-shedding. Validation tests for the suggested technique included the precision recall curve, the confusion matrix, the class likelihood ratio, the Brier skill scores and critical difference factor (CDF). Logistic regression (LR) produced the highest CDF average score, while support vector classifier (SVC) had the highest balanced precision (90.694%). The recursive feature elimination (RFE) model exhibited the most significant true negative and true positive counts, at 50.59% and 40.84%, respectively, and the highest proportion of valid classifications.

2024-01-01 — Non-plug-in estimators could outperform plug-in estimators: a cautionary note and a diagnosis

Authors: Hongxiang Qiu
Year: 2024
Publication Date: 2024-01-01
Venue: Epidemiologic Methods
DOI: 10.1515/em-2024-0008
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract Objectives Highly flexible nonparametric estimators have gained popularity in causal inference and epidemiology. Popular examples of such estimators include targeted maximum likelihood estimators (TMLE) and double machine learning (DML). TMLE is often argued or suggested to be better than DML estimators and several other estimators in small to moderate samples – even if they share the same large-sample properties – because TMLE is a plug-in estimator and respects the known bounds on the parameter, while other estimators might fall outside the known bounds and yield absurd estimates. However, this argument is not a rigorously proven result and may fail in certain cases. Methods In a carefully chosen simulation setting, I compare the performance of several versions of TMLE and DML estimators of the average treatment effect among treated in small to moderate samples. Results In this simulation setting, DML estimators outperforms some versions of TMLE in small samples. TMLE fluctuations are unstable, and hence empirically checking the magnitude of the TMLE fluctuation might alert cases where TMLE might perform poorly. Conclusions As a plug-in estimator, TMLE is not guaranteed to outperform non-plug-in counterparts such as DML estimators in small samples. Checking the fluctuation magnitude might be a useful diagnosis for TMLE. More rigorous theoretical justification is needed to understand and compare the finite-sample performance of these highly flexible estimators in general.

2024-01-01 — IgG Antibody Responses to Epstein-Barr Virus in Myalgic Encephalomyelitis/Chronic Fatigue Syndrome: Their Effective Potential for Disease Diagnosis and Pathological Antigenic Mimicry

Authors: A. Fonseca, Mateusz Szysz, Hoang Thien Ly, C. Cordeiro, N. Sepúlveda
Year: 2024
Publication Date: 2024-01-01
Venue: Medicina
DOI: 10.3390/medicina60010161
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Background and Objectives: The diagnosis and pathology of myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) remain under debate. However, there is a growing body of evidence for an autoimmune component in ME/CFS caused by the Epstein-Barr virus (EBV) and other viral infections. Materials and Methods: In this work, we analyzed a large public dataset on the IgG antibodies to 3054 EBV peptides to understand whether these immune responses could help diagnose patients and trigger pathological autoimmunity; we used healthy controls (HCs) as a comparator cohort. Subsequently, we aimed at predicting the disease status of the study participants using a super learner algorithm targeting an accuracy of 85% when splitting data into train and test datasets. Results: When we compared the data of all ME/CFS patients or the data of a subgroup of those patients with non-infectious or unknown disease triggers to the data of the HC, we could not find an antibody-based classifier that would meet the desired accuracy in the test dataset. However, we could identify a 26-antibody classifier that could distinguish ME/CFS patients with an infectious disease trigger from the HCs with 100% and 90% accuracies in the train and test sets, respectively. We finally performed a bioinformatic analysis of the EBV peptides associated with these 26 antibodies. We found no correlation between the importance metric of the selected antibodies in the classifier and the maximal sequence homology between human proteins and each EBV peptide recognized by these antibodies. Conclusions: In conclusion, these 26 antibodies against EBV have an effective potential for disease diagnosis in a subset of patients. However, the peptides associated with these antibodies are less likely to induce autoimmune B-cell responses that could explain the pathogenesis of ME/CFS.

2024-01-01 — Evaluation of an outreach programme for patients with COVID-19 in an integrated healthcare delivery system: a retrospective cohort study

Authors: Laura C Myers, Brian L. Lawson, Gabriel J. Escobar, Kathleen A Daly, Yi-fen Irene Chen, Richard Dlott, Catherine Lee, Vincent X. Liu
Year: 2024
Publication Date: 2024-01-01
Venue: BMJ Open
DOI: 10.1136/bmjopen-2023-073622
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Objectives In the first year of the COVID-19 pandemic, health systems implemented programmes to manage outpatients with COVID-19. The goal was to expedite patients’ referral to acute care and prevent overcrowding of medical centres. We sought to evaluate the impact of such a programme, the COVID-19 Home Care Team (CHCT) programme. Design Retrospective cohort. Setting Kaiser Permanente Northern California. Participants Adult members before COVID-19 vaccine availability (1 February 2020–31 January 2021) with positive SARS-CoV-2 tests. Intervention Virtual programme to track and treat patients with ‘CHCT programme’. Outcomes The outcomes were (1) COVID-19-related emergency department visit, (2) COVID-19-related hospitalisation and (3) inpatient mortality or 30-day hospice referral. Measures We estimated the average effect comparing patients who were and were not treated by CHCT. We estimated propensity scores using an ensemble super learner (random forest, XGBoost, generalised additive model and multivariate adaptive regression splines) and augmented inverse probability weighting. Results There were 98 585 patients with COVID-19. The majority were followed by CHCT (n=80 067, 81.2%). Patients followed by CHCT were older (mean age 43.9 vs 41.6 years, p<0.001) and more comorbid with COmorbidity Point Score, V.2, score ≥65 (1.7% vs 1.1%, p<0.001). Unadjusted analyses showed more COVID-19-related emergency department visits (9.5% vs 8.5%, p<0.001) and hospitalisations (3.9% vs 3.2%, p<0.001) in patients followed by CHCT but lower inpatient death or 30-day hospice referral (0.3% vs 0.5%, p<0.001). After weighting, there were higher rates of COVID-19-related emergency department visits (estimated intervention effect −0.8%, 95% CI −1.4% to −0.3%) and hospitalisation (−0.5%, 95% CI −0.9% to −0.1%) but lower inpatient mortality or 30-day hospice referral (−0.5%, 95% CI −0.7% to −0.3%) in patients followed by CHCT. Conclusions Despite CHCT following older patients with higher comorbidity burden, there appeared to be a protective effect. Patients followed by CHCT were more likely to present to acute care and less likely to die inpatient.

2024 — Disfluency Assessment Using Deep Super Learners

Authors: Sheena Christabel Pravin, Susan Elias, V. Sivaraman, G. Rohith, Y. Asnath, Victy Phamila
Year: 2024
Venue: IEEE Access
DOI: 10.1109/ACCESS.2024.3356350
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
The use of machine learning algorithms for the assessment of speech fluency is increasingly becoming recognized globally due to their ability to quickly identify speech impairments. This approach is preferred over manual diagnosis, as it reduces the likelihood of human error and minimizes the delay in commencing the therapy. A pipelined deep learner-dual classifier (PDL-DC) is proposed for the automated detection of speech impairment. The assessment of individuals’ speech fluency consisted of two distinct phases: the classification of speech disfluencies and the categorization of fluency disorders. Speech disfluencies, including revisions, prolongations, whole-word repetitions, word-medial repetitions, and filled pauses, were categorized into distinct groupings. The second aspect of classification pertains to the assessment of fluency levels, wherein speakers are classified into three categories: healthy individuals, individuals with stuttering, and individuals with Specific Language Impairment (SLI). The proposed model’s implementation of a pipelined design enables the dual validation of a subject’s fluency. The proposed model demonstrates an average classification accuracy, precision, and recall of 97%.

2024-01-01 — Decision tree models for the estimation of geo-polymer concrete compressive strength.

Authors: Ji Zhou, Zhanlin Su, Shahab Hosseini, Qiong Tian, Yijun Lu, Hao Luo, Xingquan Xu, Chupeng Chen, Jiandong Huang
Year: 2024
Publication Date: 2024-01-01
Venue: Mathematical biosciences and engineering : MBE
DOI: 10.3934/mbe.2024061
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
The green concretes industry benefits from utilizing gel to replace parts of the cement in concretes. However, measuring the compressive strength of geo-polymer concretes (CSGPoC) needs a significant amount of work and expenditure. Therefore, the best idea is predicting CSGPoC with a high level of accuracy. To do this, the base learner and super learner machine learning models were proposed in this study to anticipate CSGPoC. The decision tree (DT) is applied as base learner, and the random forest and extreme gradient boosting (XGBoost) techniques are used as super learner system. In this regard, a database was provided involving 259 CSGPoC data samples, of which four-fifths of is considered for the training model and one-fifth is selected for the testing models. The values of fly ash, ground-granulated blast-furnace slag (GGBS), Na2SiO3, NaOH, fine aggregate, gravel 4/10 mm, gravel 10/20 mm, water/solids ratio, and NaOH molarity were considered as input of the models to estimate CSGPoC. To evaluate the reliability and performance of the decision tree (DT), XGBoost, and random forest (RF) models, 12 performance evaluation metrics were determined. Based on the obtained results, the highest degree of accuracy is achieved by the XGBoost model with mean absolute error (MAE) of 2.073, mean absolute percentage error (MAPE) of 5.547, Nash-Sutcliffe (NS) of 0.981, correlation coefficient (R) of 0.991, R2 of 0.982, root mean square error (RMSE) of 2.458, Willmott's index (WI) of 0.795, weighted mean absolute percentage error (WMAPE) of 0.046, Bias of 2.073, square index (SI) of 0.054, p of 0.027, mean relative error (MRE) of -0.014, and a20 of 0.983 for the training model and MAE of 2.06, MAPE of 6.553, NS of 0.985, R of 0.993, R2 of 0.986, RMSE of 2.307, WI of 0.818, WMAPE of 0.05, Bias of 2.06, SI of 0.056, p of 0.028, MRE of -0.015, and a20 of 0.949 for the testing model. By importing the testing set into trained models, values of 0.8969, 0.9857, and 0.9424 for R2 were obtained for DT, XGBoost, and RF, respectively, which show the superiority of the XGBoost model in CSGPoC estimation. In conclusion, the XGBoost model is capable of more accurately predicting CSGPoC than DT and RF models.

2023 (106 papers)
2023-12-31 — Prediction of PM2.5 using Super Learner Ensemble

Authors: Ji-su Park, Yu-jeong Song, M. Suh, Chansoo Kim
Year: 2023
Publication Date: 2023-12-31
Venue: Journal of Korean Society for Atmospheric Environment
DOI: 10.5572/kosae.2023.39.6.1038
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2023-12-30 — GenAI for Omics: Diffusion-Augmented Pipelines and Graph-Causal Pathway Inference

Authors: Murali Krishna Pasupuleti
Year: 2023
Publication Date: 2023-12-30
Venue: International Journal of Academic and Industrial Research Innovations(IJAIRI)
DOI: 10.62311/nesx/rp-301223-160-173
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Abstract High-dimensional multi-omics studies face severe small-sample regimes, batch effects, and pathway redundancy that degrade discovery power and causal interpretability. This study proposes a generative-AI pipeline that integrates denoising diffusion augmentation with graph-causal pathway inference to improve sensitivity and counterfactual validity. The approach ingests harmonized transcriptomic, proteomic, and metabolomic matrices; class-conditioned diffusion models generate plausibility-constrained synthetic samples using feature priors and sparsity masks; graph neural encoders operate on curated pathway graphs to produce pathway activity scores; and targeted learning (TMLE/DR-learner) estimates pathway-level effects on clinical outcomes with bias-variance control. Experiments on synthetic yet realistic benchmarks emulating KEGG/Reactome topologies, cross-platform noise, and covariate shift indicate that diffusion-augmented training improves pathway detection AUROC by 4–8 points and reduces false discovery rate (BH-FDR) by 6–10% at matched power; calibration error (ECE) decreases by 20–35% and counterfactual coverage improves toward nominal levels with ≤12% runtime overhead relative to non-augmented baselines. Implications include enhanced robustness to batch effects and improved transportability across cohorts; limitations arise from reliance on curated pathway graphs, synthetic evaluation, and potential diffusion over-smoothing if priors are mis-specified. Why it matters: the pipeline couples generative realism with causal estimands, enabling pathway-level decisions that are both data-efficient and policy-relevant. What’s inside: modular training/inference recipes, hyperparameter ranges, diagnostics for diffusion fidelity and causal identifiability, and an end-to-end evaluation protocol. Keywords multi-omics, denoising diffusion models, data augmentation, graph neural networks, pathway analysis, targeted maximum likelihood estimation, double robustness, causal graphs, KEGG, Reactome, calibration, counterfactual inference

2023-12-30 — AI-Augmented Real-World Evidence in Stata: Targeted Learning for Treatment Effects

Authors: Murali Krishna Pasupuleti
Year: 2023
Publication Date: 2023-12-30
Venue: International Journal of Academic and Industrial Research Innovations(IJAIRI)
DOI: 10.62311/nesx/rp-301223-127-144
Link: Semantic Scholar
Matched Keywords: super learner, targeted maximum likelihood estimation, tmle

Abstract:
Abstract: Real-world evidence (RWE) studies face bias from confounding, missingness, and informative censoring, limiting credible effect estimation for policy and practice. This study proposes an AI-augmented targeted learning workflow in Stata for average and heterogeneous treatment effects. The approach integrates programmatic cohort construction, principled handling of missing data and censoring, and automated model selection via a Super Learner ensemble that screens candidate learners using cross-validation and meta-learning heuristics. Methods include estimation of nuisance functions for propensity and outcome, targeted maximum likelihood estimation (TMLE) for average treatment effects, and doubly robust DR-learners for conditional effects; robustness checks cover overlap diagnostics, sensitivity to unmeasured confounding, and transportability assessments across sites. In synthetic yet realistic RWE benchmarks, the pipeline attains lower absolute bias and shorter confidence intervals while retaining nominal coverage, with stable runtimes suitable for routine analyses. Implications include an auditable, reproducible pathway to deploy targeted learning within standard Stata workflows, enabling decision-grade RWE under common data irregularities; limitations involve reliance on positivity, data quality, and the external validity of simulated scenarios. This matters because health and policy stakeholders require defensible causal estimates from imperfect observational data. What is provided is an end-to-end template, with diagnostics and reporting artifacts aligned to regulatory expectations. Keywords Real-world evidence, Stata, Targeted maximum likelihood estimation, Super Learner, Doubly robust estimation, Propensity score, Causal inference, Treatment effect heterogeneity, DR-learner, Sensitivity analysis, Positivity and overlap, Reproducible analytics

2023-12-21 — Using machine learning to forecast domestic homicide via police data and super learning

Authors: Jacob Verrey, Barak Ariel, Vincent Harinam, Luke Dillon
Year: 2023
Publication Date: 2023-12-21
Venue: Scientific Reports
DOI: 10.1038/s41598-023-50274-2
Link: Semantic Scholar
Matched Keywords: super learner, super learning

Abstract:
We explore the feasibility of using machine learning on a police dataset to forecast domestic homicides. Existing forecasting instruments based on ordinary statistical instruments focus on non-fatal revictimization, produce outputs with limited predictive validity, or both. We implement a “super learner,” a machine learning paradigm that incorporates roughly a dozen machine learning models to increase the recall and AUC of forecasting using any one model. We purposely incorporate police records only, rather than multiple data sources, to illustrate the practice utility of the super learner, as additional datasets are often unavailable due to confidentiality considerations. Using London Metropolitan Police Service data, our model outperforms all extant domestic homicide forecasting tools: the super learner detects 77.64% of homicides, with a precision score of 18.61% and a 71.04% Area Under the Curve (AUC), which, collectively and severely, are assessed as “excellent.” Implications for theory, research, and practice are discussed.

2023-12-12 — Application of multi-angle spaceborne observations in characterizing the long-term particulate organic carbon pollution in China

Authors: Yun Hang, Qiang Pu, Qiao Zhu, Xia Meng, Zhihao Jin, F. Liang, Hezhong Tian, Tiantian Li, Tijian Wang, Junji Cao, Qingyan Fu, Sagnik Dey, Shenshen Li, Kan Huang, Haidong Kan, Xiaoming Shi, Yang Liu
Year: 2023
Publication Date: 2023-12-12
Venue: Research Square
DOI: 10.21203/rs.3.rs-3734829/v1
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Ambient PM2.5 pollution is recognized as a leading environmental risk factor, causing significant mortality and morbidity in China. However, the specific contributions of individual PM2.5 constituents remain unclear, primarily due to the lack of a comprehensive ground monitoring network for constituents. This issue is particularly critical for carbonaceous species such as organic carbon (OC) and elemental carbon (EC), which are known for their significant health impacts, and understanding the OC/EC ratio is crucial for identifying pollution sources. To address this, we developed a Super Learner model integrating Multi-angle Imaging SpectroRadiometer (MISR) retrievals to predict daily OC concentrations across China from 2003 to 2019 at a 10-km spatial resolution. Our model demonstrates robust predictive accuracy, as evidenced by a random cross-validation R2 of 0.84 and an RMSE of 4.9 μg/m3, at the daily level. Although MISR is a polar-orbiting instrument, its fractional aerosol data make a significant contribution to the OC exposure model. We then use the model to explore the spatiotemporal distributions of OC and further calculate the EC/OC ratio in China. We compared regional pollution discrepancies and source contributions of carbonaceous pollution over three selected regions: Beijing-Tianjin-Hebei, Fenwei Plain, and Yunnan Province. Our model observes that OC levels are elevated in Northern China due to industrial operations and central heating during the heating season, while in Yunnan, OC pollution is mainly contributed by local forest fires during fire seasons. Additionally, we found that OC pollution in China is likely influenced by climate phenomena such as the El Niño-Southern Oscillation. Considering that climate change is increasing the severity of OC concentrations with more frequent fire events, and its influence on OC formation and dispersion, we suggest emphasizing the role of climate change in future OC pollution control policies. We believe this study will contribute to future epidemiological studies on OC, aiding in refining public health guidelines and enhancing air quality management in China.

2023-12-07 — The application of target trials with longitudinal targeted maximum likelihood estimation to assess the effect of alcohol consumption in adolescence on depressive symptoms in adulthood.

Authors: Yan Liu, Mireille E. Schnitzer, Ronald Herrera, Iván Díaz, Jennifer O'Loughlin, M. Sylvestre
Year: 2023
Publication Date: 2023-12-07
Venue: American Journal of Epidemiology
DOI: 10.1093/aje/kwad241
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2023-12-01 — MSR75 Correlate: Assessing Dose Effect Using Targeted Maximum Likelihood Estimation (TMLE)

Authors: J. Grossman, M. Ghadessi, A. Contijoch, H. Ostojic, A. Cervantes, J.M. O'Connor, M. Ducreux
Year: 2023
Publication Date: 2023-12-01
Venue: Value in Health
DOI: 10.1016/j.jval.2023.09.2134
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Abstract unavailable from Semantic Scholar.

2023-12-01 — MSR102 Advancing Causal Inference With Machine Learning and Real-World Data: An Application of Targeted Machine Learning and Super Learners on Hospital-Acquired Pressure Injuries From MIMIC IV

Authors: A. Wilson, M. Gregg, E. Streja, J. Alderden, J. Vanderpuye-Orgle, M. Roessner
Year: 2023
Publication Date: 2023-12-01
Venue: Value in Health
DOI: 10.1016/j.jval.2023.09.2161
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2023-11-30 — Geometry-Aware Normalizing Wasserstein Flows for Optimal Causal Inference

Authors: Kaiwen Hou
Year: 2023
Publication Date: 2023-11-30
Venue: arXiv.org
DOI: 10.48550/arXiv.2311.18826
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
This paper presents a groundbreaking approach to causal inference by integrating continuous normalizing flows (CNFs) with parametric submodels, enhancing their geometric sensitivity and improving upon traditional Targeted Maximum Likelihood Estimation (TMLE). Our method employs CNFs to refine TMLE, optimizing the Cram\'er-Rao bound and transitioning from a predefined distribution $p_0$ to a data-driven distribution $p_1$. We innovate further by embedding Wasserstein gradient flows within Fokker-Planck equations, thus imposing geometric structures that boost the robustness of CNFs, particularly in optimal transport theory. Our approach addresses the disparity between sample and population distributions, a critical factor in parameter estimation bias. We leverage optimal transport and Wasserstein gradient flows to develop causal inference methodologies with minimal variance in finite-sample settings, outperforming traditional methods like TMLE and AIPW. This novel framework, centered on Wasserstein gradient flows, minimizes variance in efficient influence functions under distribution $p_t$. Preliminary experiments showcase our method's superiority, yielding lower mean-squared errors compared to standard flows, thereby demonstrating the potential of geometry-aware normalizing Wasserstein flows in advancing statistical modeling and inference.

2023-11-25 — Using Multi-Modal Electronic Health Record Data for the Development and Validation of Risk Prediction Models for Long COVID Using the Super Learner Algorithm

Authors: Weijia Jin, W. Hao, Xu Shi, L. Fritsche, M. Salvatore, A. Admon, Christopher R. Friese, B. Mukherjee
Year: 2023
Publication Date: 2023-11-25
Venue: Journal of Clinical Medicine
DOI: 10.3390/jcm12237313
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Background: Post-Acute Sequelae of COVID-19 (PASC) have emerged as a global public health and healthcare challenge. This study aimed to uncover predictive factors for PASC from multi-modal data to develop a predictive model for PASC diagnoses. Methods: We analyzed electronic health records from 92,301 COVID-19 patients, covering medical phenotypes, medications, and lab results. We used a Super Learner-based prediction approach to identify predictive factors. We integrated the model outputs into individual and composite risk scores and evaluated their predictive performance. Results: Our analysis identified several factors predictive of diagnoses of PASC, including being overweight/obese and the use of HMG CoA reductase inhibitors prior to COVID-19 infection, and respiratory system symptoms during COVID-19 infection. We developed a composite risk score with a moderate discriminatory ability for PASC (covariate-adjusted AUC (95% confidence interval): 0.66 (0.63, 0.69)) by combining the risk scores based on phenotype and medication records. The combined risk score could identify 10% of individuals with a 2.2-fold increased risk for PASC. Conclusions: We identified several factors predictive of diagnoses of PASC and integrated the information into a composite risk score for PASC prediction, which could contribute to the identification of individuals at higher risk for PASC and inform preventive efforts.

2023-11-25 — Modeling wildland fire burn severity in California using a spatial Super Learner approach

Authors: Nicholas Simafranca, Bryant Willoughby, Erin O’Neil, Sophie Farr, Brian J. Reich, Naomi Giertych, Margaret C. Johnson, Madeleine Pascolini-Campbell
Year: 2023
Publication Date: 2023-11-25
Venue: Environmental and Ecological Statistics
DOI: 10.1007/s10651-024-00601-1
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Given the increasing prevalence of wildland fires in the Western US, there is a critical need to develop tools to understand and accurately predict burn severity. We develop a novel machine learning model to predict post-fire burn severity using pre-fire remotely sensed data. Hydrological, ecological, and topographical variables collected from four regions of California — the site of the Kincade fire (2019), the CZU Lightning Complex fire (2020), the Windy fire (2021), and the KNP Fire (2021) — are used as predictors of the differenced normalized burn ratio. We hypothesize that a Super Learner (SL) algorithm that accounts for spatial autocorrelation using Vecchia’s Gaussian approximation will accurately model burn severity. We use a cross-validation study to show that the spatial SL model can predict burn severity with reasonable classification accuracy, including high burn severity events. After fitting and verifying the performance of the SL model, we use interpretable machine learning tools to determine the main drivers of severe burn damage, including greenness, elevation, and fire weather variables. These findings provide actionable insights that enable communities to strategize interventions, such as early fire detection systems, pre-fire season vegetation clearing activities, and resource allocation during emergency responses. When implemented, this model has the potential to minimize the loss of human life, property, resources, and ecosystems in California.

2023-11-20 — Two‐stage targeted maximum likelihood estimation for mixed aggregate and individual participant data analysis with an application to multidrug resistant tuberculosis

Authors: Arman Alam Siddique, M. Schnitzer, Narayanaswamy Balakrishnan, G. Sotgiu, Mario H. Vargas, D. Menzies, A. Benedetti
Year: 2023
Publication Date: 2023-11-20
Venue: Statistics in Medicine
DOI: 10.1002/sim.9963
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
In this study, we develop a new method for the meta‐analysis of mixed aggregate data (AD) and individual participant data (IPD). The method is an adaptation of inverse probability weighted targeted maximum likelihood estimation (IPW‐TMLE), which was initially proposed for two‐stage sampled data. Our methods are motivated by a systematic review investigating treatment effectiveness for multidrug resistant tuberculosis (MDR‐TB) where the available data include IPD from some studies but only AD from others. One complication in this application is that participants with MDR‐TB are typically treated with multiple antimicrobial agents where many such medications were not observed in all studies considered in the meta‐analysis. We focus here on the estimation of the expected potential outcome while intervening on a specific medication but not intervening on any others. Our method involves the implementation of a TMLE that transports the estimation from studies where the treatment is observed to the full target population. A second weighting component adjusts for the studies with missing (inaccessible) IPD. We demonstrate the properties of the proposed method and contrast it with alternative approaches in a simulation study. We finally apply this method to estimate treatment effectiveness in the MDR‐TB case study.

2023-11-18 — Protocol of the Comparison of Intravesical Therapy and Surgery as Treatment Options (CISTO) study: a pragmatic, prospective multicenter observational cohort study of recurrent high-grade non-muscle invasive bladder cancer

Authors: John L. Gore, Erika M Wolff, Bryan A. Comstock, Kristin M Follmer, Michael G Nash, Anirban Basu, Stephanie Chisolm, Doug B MacLean, Jenney R. Lee, Y. Lotan, S. Porten, Gary D. Steinberg, Sam S. Chang, Scott M. Gilbert, Larry G Kessler, Angela B. Smith, Patrick J. On H. Sung Min Solange Christopher Jeffrey C. T Heagerty Ho Kim Mecham Nefcy Bassett Bivalacqua Ch, P. Heagerty, On H. Ho, Sung Min Kim, Solange Mecham, Christopher Nefcy, J. Bassett, T. Bivalacqua, K. Chamie, David Y T Chen, S. Daneshmand, R. Dickstein, A. Gadzinski, Thomas J. Guzzo, A. Kamat, M. Kates, J. Kukreja, Brian R. Lane, Eugene K. Lee, L. Macleod, Ahmed M. Mansour, V. Master, Parth K. Modi, J. Montgomery, David S. Morris, M. Mossanen, K. Nepple, Jeffrey W. Nix, Brock B O'Neil, Sanjay Patel, Charles C. Peyton, K. Pohar, C. Ritch, Alex Sankin, K. Scarpato, Neal D. Shore, M. Tyson, M. Westerman, S. Woldu, Stephanie Chisolm, Jonathan L. Wright, Fred Almeida, Mary Beth Ballard Murray, Nancy Lindsey, R. Lipman, Rick M. Oliver, Lori A. Roscoe, Karen Sachse, James W. F. Catto, Tracy M Downs, Tullika Garg, E. Gibb, Jennifer L. Malin, Jennifer M. Taylor
Year: 2023
Publication Date: 2023-11-18
Venue: BMC Cancer
DOI: 10.1186/s12885-023-11605-8
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Background Bladder cancer poses a significant public health burden, with high recurrence and progression rates in patients with non-muscle-invasive bladder cancer (NMIBC). Current treatment options include bladder-sparing therapies (BST) and radical cystectomy, both with associated risks and benefits. However, evidence supporting optimal management decisions for patients with recurrent high-grade NMIBC remains limited, leading to uncertainty for patients and clinicians. The CISTO (Comparison of Intravesical Therapy and Surgery as Treatment Options) Study aims to address this critical knowledge gap by comparing outcomes between patients undergoing BST and radical cystectomy. Methods The CISTO Study is a pragmatic, prospective observational cohort trial across 36 academic and community urology practices in the US. The study will enroll 572 patients with a diagnosis of recurrent high-grade NMIBC who select management with either BST or radical cystectomy. The primary outcome is health-related quality of life (QOL) at 12 months as measured with the EORTC-QLQ-C30. Secondary outcomes include bladder cancer-specific QOL, progression-free survival, cancer-specific survival, and financial toxicity. The study will also assess patient preferences for treatment outcomes. Statistical analyses will employ targeted maximum likelihood estimation (TMLE) to address treatment selection bias and confounding by indication. Discussion The CISTO Study is powered to detect clinically important differences in QOL and cancer-specific survival between the two treatment approaches. By including a diverse patient population, the study also aims to assess outcomes across the following patient characteristics: age, gender, race, burden of comorbid health conditions, cancer severity, caregiver status, social determinants of health, and rurality. Treatment outcomes may also vary by patient preferences, health literacy, and baseline QOL. The CISTO Study will fill a crucial evidence gap in the management of recurrent high-grade NMIBC, providing evidence-based guidance for patients and clinicians in choosing between BST and radical cystectomy. The CISTO study will provide an evidence-based approach to identifying the right treatment for the right patient at the right time in the challenging clinical setting of recurrent high-grade NMIBC. Trial registration ClinicalTrials.gov, NCT03933826. Registered on May 1, 2019.

2023-11-17 — Evaluating a Targeted Minimum Loss-Based Estimator for Capture-Recapture Analysis: An Application to HIV Surveillance in San Francisco, California

Authors: P. Wesson, Manjari Das, Mia Chen, Ling Hsu, Willi McFarland, Edward H. Kennedy, Nicholas P. Jewell
Year: 2023
Publication Date: 2023-11-17
Venue: American Journal of Epidemiology
DOI: 10.1093/aje/kwad231
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation, tmle

Abstract:
Abstract The capture-recapture method is a common tool used in epidemiology to estimate the size of “hidden” populations and correct the underascertainment of cases, based on incomplete and overlapping lists of the target population. Log-linear models are often used to estimate the population size yet may produce implausible and unreliable estimates due to model misspecification and small cell sizes. A novel targeted minimum loss-based estimation (TMLE) model developed for capture-recapture makes several notable improvements to conventional modeling: “targeting” the parameter of interest, flexibly fitting the data to alternative functional forms, and limiting bias from small cell sizes. Using simulations and empirical data from the San Francisco, California, Department of Public Health’s human immunodeficiency virus (HIV) surveillance registry, we evaluated the performance of the TMLE model and compared results with those of other common models. Based on 2,584 people observed on 3 lists reportable to the surveillance registry, the TMLE model estimated the number of San Francisco residents living with HIV as of December 31, 2019, to be 13,523 (95% confidence interval: 12,222, 14,824). This estimate, compared with a “ground truth” of 12,507, was the most accurate and precise of all models examined. The TMLE model is a significant advancement in capture-recapture studies, leveraging modern statistical methods to improve estimation of the sizes of hidden populations.

2023-11-06 — Practical considerations for variable screening in the Super Learner

Authors: Brian D. Williamson, Drew King, Ying Huang
Year: 2023
Publication Date: 2023-11-06
Venue: The New England Journal of Statistics in Data Science
DOI: 10.51387/25-NEJSDS82
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Estimating a prediction function is a fundamental component of many data analyses. The super learner ensemble, a particular implementation of stacking, has desirable theoretical properties and has been used successfully in many applications. Dimension reduction can be accomplished by using variable screening algorithms (screeners), including the lasso, within the ensemble prior to fitting other prediction algorithms. However, the performance of a super learner using the lasso for dimension reduction has not been fully explored in cases where the lasso is known to perform poorly. We provide empirical results that suggest that a diverse set of candidate screeners should be used to protect against poor performance of any one screener, similar to the guidance for choosing a library of prediction algorithms for the super learner. These results are further illustrated through the analysis of HIV-1 antibody data.

2023-11-04 — High performance machine learning approach for reference evapotranspiration estimation

Authors: M. S. Aly, Saad M. Darwish, Ahmed A. Aly
Year: 2023
Publication Date: 2023-11-04
Venue: Stochastic environmental research and risk assessment (Print)
DOI: 10.1007/s00477-023-02594-y
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Accurate reference evapotranspiration (ET_0) estimation has an effective role in reducing water losses and raising the efficiency of irrigation water management. The complicated nature of the evapotranspiration process is illustrated in the amount of meteorological variables required to estimate ET_0. Incomplete meteorological data is the most significant challenge that confronts ET_0 estimation. For this reason, different machine learning techniques have been employed to predict ET_0, but the complicated structures and architectures of many of them make ET_0 estimation very difficult. For these challenges, ensemble learning techniques are frequently employed for estimating ET_0, particularly when there is a shortage of meteorological data. This paper introduces a powerful super learner ensemble technique for ET_0 estimation, where four machine learning models: Extra Tree Regressor, Support Vector Regressor, K-Nearest Neighbor and AdaBoost Regression represent the base learners and their outcomes used as training data for the meta learner. Overcoming the overfitting problem that affects most other ensemble methods is a significant advantage of this cross-validation theory-based approach. Super learner performances were compared with the base learners for their forecasting capabilities through different statistical standards, where the results revealed that the super learner has better accuracy than the base learners, where different combinations of variables have been used whereas Coefficient of Determination (R^2) ranged from 0.9279 to 0.9994 and Mean Squared Error (MSE) ranged from 0.0026 to 0.3289 mm/day but for the base learners R^2 ranged from 0.5592 to 0.9977, and MSE ranged from 0.0896 to 2.0118 mm/day therefore, super learner is highly recommended for ET_0 prediction with limited meteorological data.

2023-11-03 — Schizophrenia and Bipolar Psychosis Classification with rsfMRI Functional Connectivity Feature Fusion technique using Super Learner

Authors: Srikireddy Dhanunjay Reddy, Kumar Gaurav, Tharun Kumar Reddy
Year: 2023
Publication Date: 2023-11-03
Venue: 2023 IEEE Silchar Subsection Conference (SILCON)
DOI: 10.1109/SILCON59133.2023.10404202
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Schizophrenia and bipolar psychosis are intricate mental disorders that share similar clinical characteristics, making it difficult to establish precise diagnoses and classifications. In recent times, neuroimaging has displayed the potential to deepen our comprehension of these disorders. This study presents an enhanced framework to classify schizophrenia (SZ) and bipolar psychosis (BP) by combining resting-state functional magnetic resonance imaging (rsfMRI) functional connectivity and statistical features. Merging information of Intrinsic Connectivity Networks (ICN), Functional Network Connectivity (FNC), Kurtosis (K), and Power Spectral Density (PSD) details improve the classifying ability of the weighted voting super learner. The experimental results demonstrate that the proposed framework outperforms existing methods in terms of both prediction accuracy and computational efficiency with a mean gain of 6%. Moreover, the proposed technique achieves high classification performance while maintaining a low memory footprint and computational cost, making it ideal to classify the two identical overlapping classes with good precision values.

2023-11-01 — Bystander defibrillation increases 30-day survival even with short emergency medical service response time

Authors: M. Hindborg, H. Yonis, F. Gnesin, K. Soerensen, C. Torp-Pedersen
Year: 2023
Publication Date: 2023-11-01
Venue: European Heart Journal
DOI: 10.1093/eurheartj/ehad655.3035
Link: Semantic Scholar
Matched Keywords: super learning, targeted maximum likelihood estimation, tmle

Abstract:
Rates of bystander initiated defibrillation are increasing in many areas of the world (1-4), but the effect of defibrillation at different time points after recognition of out-of-hospital cardiac arrest (OHCA) remains largely unknown. To assess the effect on 30-day survival of bystander defibrillation at different intervals of emergency medical service (EMS) response time. We included OHCAs from 2016 through 2020 from the Danish Cardiac Arrest Registry. Cases were included if they were ≥18 years of age, bystander witnessed, received bystander cardiopulmonary resuscitation (CPR), had EMS response time of 25 minutes or less and only the first OHCA for each subject was considered. We excluded cases witnessed by EMS or if they had missing values of OHCA-related variables. Crude survival proportions were calculated for each minute of EMS response time. Relative risks (RR) and corresponding 95 % confidence intervals (95 % CI) of the outcome (30-day survival) for eight different intervals of EMS response time were estimated using a causal inference framework utilizing Targeted Maximum Likelihood Estimation (TMLE) with ensemble Super Learning, adjusting for age, sex, place of arrest (public/private), initial cardiac rhythm (shockable/not shockable), and comorbidities (prior acute myocardial infarction, ischemic heart disease, heart failure, chronic obstructive pulmonary disease, stroke, cancer). We included 7,471 cases of bystander witnessed OHCA receiving CPR before EMS arrival. Of these, 14.7 % (1,098/7,471) received bystander defibrillation before arrival of EMS. Overall, 44.5 % (489/1,098) survived to 30 days when bystander defibrillation was performed versus 18.8 % (1,200/6,373) when no bystander defibrillation was performed. When examining the crude survival proportion for each minute of EMS response time, we found that 30-day survival was consistently higher in the group receiving bystander defibrillation for the first 20 minutes after which the survival was approximately the same for the two groups (Figure 1). When adjusting for confounders, we found statistically significantly increased relative risks of 30-day survival for patients receiving bystander defibrillation, compared to patients not bystander defibrillated, for all intervals of EMS response time, except for response times of 0-2 min., where the increase did not reach statistical significance (0-2 min: RR; 1.34 [95 % CI: 0.88-2.05], 2-4 min: RR; 1.37 [95 % CI: 1.10-1.70], 4-6 min: RR; 1.55 [95 % CI: 1.33-1.80], 6-8 min: RR; 2.23 [95 % CI: 1.86-2.67], 8-10 min: RR; 1.99 [95 % CI: 1.55-2.55], 10-12 min: RR; 1.89 [95 % CI: 1.34-2.68], 12-15 min: RR; 1.86 [95 CI: 1.22-2.84], >15 min: RR; 1.98 [95 % CI: 1.16-3.38]) (Figure 2). For all intervals of EMS response time, bystander defibrillation increased 30-day survival of OHCA. The effect was significant for EMS response times as short as 2-4 minutes.

2023-10-30 — Quantile Super Learning for independent and online settings with application to solar power forecasting

Authors: Herbert Susmann, Antoine Chambaz
Year: 2023
Publication Date: 2023-10-30
Venue: Computational Statistics & Data Analysis
DOI: 10.1016/j.csda.2025.108202
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Abstract unavailable from Semantic Scholar.

2023-10-29 — concrete: Targeted Estimation of Survival and Competing Risks in Continuous Time

Authors: David Chen, H. Rytgaard, Edwin C. H. Fong, J. Tarp, Maya L Petersen, M. J. Laan, T. Gerds
Year: 2023
Publication Date: 2023-10-29
Link: Semantic Scholar
Matched Keywords: super learner, tmle

Abstract:
This article introduces the R package concrete, which implements a recently developed targeted maximum likelihood estimator (TMLE) for the cause-specific absolute risks of time-to-event outcomes measured in continuous time. Cross-validated Super Learner machine learning ensembles are used to estimate propensity scores and conditional cause-specific hazards, which are then targeted to produce robust and efficient plug-in estimates of the effects of static or dynamic interventions on a binary treatment given at baseline quantified as risk differences or risk ratios. Influence curve-based asymptotic inference is provided for TMLE estimates and simultaneous confidence bands can be computed for target estimands spanning multiple multiple times or events. In this paper we review the one-step continuous-time TMLE methodology as it is situated in an overarching causal inference workflow, describe its implementation, and demonstrate the use of the package on the PBC dataset.

2023-10-24 — The impact of integrated care on health care utilization and costs in a socially deprived urban area in Germany: A difference-in-differences approach within an event-study framework.

Authors: V. Ress, Eva-Maria Wild
Year: 2023
Publication Date: 2023-10-24
Venue: Health Economics
DOI: 10.1002/hec.4771
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
We investigated the impact of an integrated care initiative in a socially deprived urban area in Germany. Using administrative data, we empirically assessed the causal effect of its two sub-interventions, which differed by the extent to which their instruments targeted the supply and demand side of healthcare provision. We addressed confounding using propensity score matching via the Super Learner machine learning algorithm. For our baseline model, we used a two-way fixed-effects difference-in-differences approach to identify causal effects. We then employed difference-in-differences analyses within an event-study framework to explore the heterogeneity of treatment effects over time, allowing us to disentangle the effects of the sub-interventions and improve causal interpretation and generalizability. The initiative led to a significant increase in hospital and emergency admissions and non-hospital outpatient visits, as well as inpatient, non-hospital outpatient, and total costs. Increased utilization may indicate that the intervention improved access to care or identified unmet need.

2023-10-18 — Longitudinal plasmode algorithms to evaluate statistical methods in realistic scenarios: an illustration applied to occupational epidemiology

Authors: Youssra Souli, X. Trudel, Awa Diop, C. Brisson, D. Talbot
Year: 2023
Publication Date: 2023-10-18
Venue: BMC Medical Research Methodology
DOI: 10.1186/s12874-023-02062-9
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Introduction Plasmode simulations are a type of simulations that use real data to determine the synthetic data-generating equations. Such simulations thus allow evaluating statistical methods under realistic conditions. As far as we know, no plasmode algorithm has been proposed for simulating longitudinal data. In this paper, we propose a longitudinal plasmode framework to generate realistic data with both a time-varying exposure and time-varying covariates. This work was motivated by the objective of comparing different methods for estimating the causal effect of a cumulative exposure to psychosocial stressors at work over time. Methods We developed two longitudinal plasmode algorithms: a parametric and a nonparametric algorithms. Data from the PROspective Québec (PROQ) Study on Work and Health were used as an input to generate data with the proposed plasmode algorithms. We evaluated the performance of multiple estimators of the parameters of marginal structural models (MSMs): inverse probability of treatment weighting, g-computation and targeted maximum likelihood estimation. These estimators were also compared to standard regression approaches with either adjustment for baseline covariates only or with adjustment for both baseline and time-varying covariates. Results Standard regression methods were susceptible to yield biased estimates with confidence intervals having coverage probability lower than their nominal level. The bias was much lower and coverage of confidence intervals was much closer to the nominal level when considering MSMs. Among MSM estimators, g-computation overall produced the best results relative to bias, root mean squared error and coverage of confidence intervals. No method produced unbiased estimates with adequate coverage for all parameters in the more realistic nonparametric plasmode simulation. Conclusion The proposed longitudinal plasmode algorithms can be important methodological tools for evaluating and comparing analytical methods in realistic simulation scenarios. To facilitate the use of these algorithms, we provide R functions on GitHub. We also recommend using MSMs when estimating the effect of cumulative exposure to psychosocial stressors at work.

2023-10-16 — Association of globalization with the burden of opioid use disorders 2019. A country-level analysis using targeted maximum likelihood estimation

Authors: Guillaume Barbalat, Geeta Reddy, N. Franck
Year: 2023
Publication Date: 2023-10-16
Venue: Globalization and Health
DOI: 10.1186/s12992-023-00980-3
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Background The “opioid crisis” has been responsible for hundreds of thousands deaths in the US, and is at risk of dissemination worldwide. Within-country studies have demonstrated that the rise of opioid use disorders (OUD) is linked to increased access to opioid prescriptions and to so-called “diseases of despair”. Both have been related to the emergence of globalization policies since the 1980s. First, globalized countries have seen a reorganization of healthcare practices towards quick and easy answers to complex needs, including increased opioid prescriptions. Second, despair has gained those suffering from the mutations of socio-economic systems and working conditions that have accompanied globalization policies (e.g. delocalization, deindustrialization, and the decline of social services). Here, using data with high quality ratings from the Global Burden of Disease database, we evaluated the country-based association between four levels of globalization and the burden of OUD 2019. Results The sample included 87 countries. Taking into account potential country-level confounders, we found that countries with the highest level of globalization were associated with a 31% increase in the burden of OUD 2019 compared to those with the lowest level of globalization (mean log difference: 0.31; 95%CI, 0.04–0.57; p = 0.02). Additional analyses showed a significant effect for low back pain (mean log difference: 0.07; 95%CI, 0.02–0.12; p = 0.007). In contrast, despite sharing some of the risk factors of OUD, other mental and substance use disorders did not show any significant relationship with globalization. Finally, socio-cultural de jure globalization, which compiles indicators related to gender equality, human capital and civil rights, was specifically associated with the burden of OUD (mean log difference: 0.49; 95%CI: 0.23,0.75; p < 0.001). Conclusions These findings suggest that OUD may have inherent underpinnings linked to globalization, and more particularly socio-cultural aspects of globalization. Key factors may be increased rights to access prescriptions, as well as increased feelings of despair related to the erosion of local cultures and widening educational gaps.

2023-10-13 — Analysis of metabolites in human gut: illuminating the design of gut-targeted drugs

Authors: Alberto Gil-Pichardo, Andrés Sánchez-Ruiz, Gonzalo Colmenarejo
Year: 2023
Publication Date: 2023-10-13
Venue: Journal of Cheminformatics
DOI: 10.1186/s13321-023-00768-y
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Gut-targeted drugs provide a new drug modality besides that of oral, systemic molecules, that could tap into the growing knowledge of gut metabolites of bacterial or host origin and their involvement in biological processes and health through their interaction with gut targets (bacterial or host, too). Understanding the properties of gut metabolites can provide guidance for the design of gut-targeted drugs. In the present work we analyze a large set of gut metabolites, both shared with serum or present only in gut, and compare them with oral systemic drugs. We find patterns specific for these two subsets of metabolites that could be used to design drugs targeting the gut. In addition, we develop and openly share a Super Learner model to predict gut permanence, in order to aid in the design of molecules with appropriate profiles to remain in the gut, resulting in molecules with putatively reduced secondary effects and better pharmacokinetics.

2023-10-12 — Personalized dynamic super learning: an application in predicting hemodiafiltration convection volumes

Authors: Arthur Chatton, Michèle Bally, Ren'ee L'evesque, Ivana Malenica, Robert W Platt, M. Schnitzer
Year: 2023
Publication Date: 2023-10-12
Venue: Journal of the Royal Statistical Society Series C: Applied Statistics
DOI: 10.1093/jrsssc/qlae070
Link: Semantic Scholar
Matched Keywords: super learner, super learning

Abstract:
Obtaining continuously updated predictions is a major challenge for personalized medicine. Leveraging combinations of parametric regressions and machine learning algorithms, the personalized online super learner (POSL) can achieve such dynamic and personalized predictions. We adapt POSL to predict a repeated continuous outcome dynamically and propose a new way to validate such personalized or dynamic prediction models. We illustrate its performance by predicting the convection volume of patients undergoing hemodiafiltration. POSL outperformed its candidate learners with respect to median absolute error, calibration-in-the-large, discrimination, and net benefit. We finally discuss the choices and challenges underlying the use of POSL.

2023-10-12 — Optimization of Lightweight Malware Detection Models for AIoT Devices

Authors: Felicia Lo, Shin-Ming Cheng, Rafael Kaliski
Year: 2023
Publication Date: 2023-10-12
Venue: World Forum on Internet of Things
DOI: 10.1109/WF-IoT58464.2023.10539588
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Malware intrusion is problematic for Internet of Things (IoT) and Artificial Intelligence of Things (AIoT) devices as they often reside in an ecosystem of connected devices, such as a smart home. If any devices are infected, the whole ecosystem can be compromised. Although various Machine Learning (ML) models are deployed to detect malware and network intrusion, generally speaking, robust high-accuracy models tend to require resources not found in all IoT devices, compared to less robust models defined by weak learners. In order to combat this issue, Fadhilla [1] proposed a meta-learner ensemble model comprised of less robust prediction results inherent with weak learner ML models to produce a highly robust meta-learning ensemble model. The main problem with the prior research is that it cannot be deployed in low-end AIoT devices due to the limited resources comprising processing power, storage, and memory (the required libraries quickly exhaust low-end AIoT devices' resources.) Hence, this research aims to optimize the proposed super learner meta-learning ensemble model[l] to make it viable for low-end AIoT devices. We show the library and ML model memory requirements associated with each optimization stage and emphasize that optimization of current ML models is necessitated for low-end AIoT devices. Our results demonstrate that we can obtain similar accuracy and False Positive Rate (FPR) metrics from high-end AIoT devices running the derived ML model, with a lower inference duration and smaller memory footprint.

2023-10-09 — Associations between blood pressure control and clinical events suggestive of nutrition care documented in electronic health records of patients with hypertension

Authors: April R. Williams, Maria D Thomson, Erin L. Britton
Year: 2023
Publication Date: 2023-10-09
Venue: BMC Medical Informatics and Decision Making
DOI: 10.1186/s12911-023-02311-3
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Background Clinical events suggestive of nutrition care found in electronic health records (EHRs) are rarely explored for their associations with hypertension outcomes. Methods Longitudinal analysis using structured EHR data from primary care visits at a health system in the US from December 2017—December 2020 of adult patients with hypertension ( n = 4,237) tested for associations between last visit blood pressure (BP) control (≤ 140 Systolic BP and ≤ 90 Diastolic BP) and ≥ 1 nutrition care clinical event operationalized as (overweight or obesity (BMI > 25 or 30, respectively) diagnoses, preventive care visits, or provision of patient education materials (PEM)). Descriptive statistics and longitudinal targeted maximum likelihood estimation (LTMLE) models were conducted to explore average treatment effects (ATE) of timing and dose response from these clinical events on blood pressure control overall and by race. Results The median age was 62 years, 29% were male, 52% were Black, 25% were from rural areas and 50% had controlled BP at baseline. Annual documentation of overweight/obesity diagnoses ranged 3.0–7.8%, preventive care visits ranged 6.2–15.7%, and PEM with dietary and hypertension content were distributed to 8.5–28.8% patients. LTMLE models stratified by race showed differences in timing, dose, and type of nutrition care. Black patients who had nutrition care in Year 3 only compared to none had lower odds for BP control (ATE -0.23, 95% CI: -0.38,-0.08, p = 0.003), preventive visits in the last 2 years high higher odds for BP control (ATE 0.31, 95% CI: 0.07,0.54, p = 0.01), and early or late PEMs had lower odds for BP control (ATE -0.08, 95% CI: -0.15,-0.01, p = 0.03 and ATE -0.23, 95% CI: -0.41,-0.05, p = 0.01, respectively). Conclusions In this study, clinical events suggestive of nutrition care are significantly associated with BP control, but are infrequent and effects differ by type, timing, and patient race. Preventive visits appear to have the most effect; additional research should include examining clinical notes for evidence of nutrition care among different populations, which may uncover areas for improving nutrition care for patients with chronic disease.

2023-10-03 — Predicting Long COVID in the National COVID Cohort Collaborative Using Super Learner: Cohort Study

Authors: Zachary Butzin-Dozier, Yunwen Ji, Haodong Li, Jeremy Coyle, Junming Shi, Rachael V. Phillips, Andrew N. Mertens, R. Pirracchio, M. J. van der Laan, Rena C Patel, J. Colford, Alan E. Hubbard
Year: 2023
Publication Date: 2023-10-03
Venue: JMIR Public Health and Surveillance
DOI: 10.2196/53322
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Background Postacute sequelae of COVID-19 (PASC), also known as long COVID, is a broad grouping of a range of long-term symptoms following acute COVID-19. These symptoms can occur across a range of biological systems, leading to challenges in determining risk factors for PASC and the causal etiology of this disorder. An understanding of characteristics that are predictive of future PASC is valuable, as this can inform the identification of high-risk individuals and future preventative efforts. However, current knowledge regarding PASC risk factors is limited. Objective Using a sample of 55,257 patients (at a ratio of 1 patient with PASC to 4 matched controls) from the National COVID Cohort Collaborative, as part of the National Institutes of Health Long COVID Computational Challenge, we sought to predict individual risk of PASC diagnosis from a curated set of clinically informed covariates. The National COVID Cohort Collaborative includes electronic health records for more than 22 million patients from 84 sites across the United States. Methods We predicted individual PASC status, given covariate information, using Super Learner (an ensemble machine learning algorithm also known as stacking) to learn the optimal combination of gradient boosting and random forest algorithms to maximize the area under the receiver operator curve. We evaluated variable importance (Shapley values) based on 3 levels: individual features, temporal windows, and clinical domains. We externally validated these findings using a holdout set of randomly selected study sites. Results We were able to predict individual PASC diagnoses accurately (area under the curve 0.874). The individual features of the length of observation period, number of health care interactions during acute COVID-19, and viral lower respiratory infection were the most predictive of subsequent PASC diagnosis. Temporally, we found that baseline characteristics were the most predictive of future PASC diagnosis, compared with characteristics immediately before, during, or after acute COVID-19. We found that the clinical domains of health care use, demographics or anthropometry, and respiratory factors were the most predictive of PASC diagnosis. Conclusions The methods outlined here provide an open-source, applied example of using Super Learner to predict PASC status using electronic health record data, which can be replicated across a variety of settings. Across individual predictors and clinical domains, we consistently found that factors related to health care use were the strongest predictors of PASC diagnosis. This indicates that any observational studies using PASC diagnosis as a primary outcome must rigorously account for heterogeneous health care use. Our temporal findings support the hypothesis that clinicians may be able to accurately assess the risk of PASC in patients before acute COVID-19 diagnosis, which could improve early interventions and preventive care. Our findings also highlight the importance of respiratory characteristics in PASC risk assessment. International Registered Report Identifier (IRRID) RR2-10.1101/2023.07.27.23293272

2023-09-23 — Targeted Learning on Variable Importance Measure for Heterogeneous Treatment Effect

Authors: Haodong Li, Alan E. Hubbard, M. V. D. Laan
Year: 2023
Publication Date: 2023-09-23
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Quantifying the heterogeneity of treatment effect is important for understanding how a commercial product or medical treatment affects different subgroups in a population. Beyond the overall impact reflected parameters like the average treatment effect, the analysis of treatment effect heterogeneity further reveals details on the importance of different covariates and how they lead to different treatment impacts. One relevant parameter that addresses such heterogeneity is the variance of treatment effect across different covariate groups, however the treatment effect is defined. One can also derive variable importance parameters that measure (and rank) how much of treatment effect heterogeneity is explained by a targeted subset of covariates. In this article, we propose a new targeted maximum likelihood estimator for a treatment effect variable importance measure. This estimator is a pure plug-in estimator that consists of two steps: 1) the initial estimation of relevant components to plug in and 2) an iterative updating step to optimize the bias-variance tradeoff. The simulation results show that this TMLE estimator has competitive performance in terms of lower bias and better confidence interval coverage compared to the simple substitution estimator and the estimating equation estimator. The application of this method also demonstrates the advantage of a substitution estimator, which always respects the global constraints on the data distribution and that the estimand is a particular function of the distribution.

2023-09-20 — Optimizing dynamic predictions from joint models using super learning

Authors: D. Rizopoulos, J. M. Taylor
Year: 2023
Publication Date: 2023-09-20
Venue: Statistics in Medicine
DOI: 10.1002/sim.10010
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Joint models for longitudinal and time‐to‐event data are often employed to calculate dynamic individualized predictions used in numerous applications of precision medicine. Two components of joint models that influence the accuracy of these predictions are the shape of the longitudinal trajectories and the functional form linking the longitudinal outcome history to the hazard of the event. Finding a single well‐specified model that produces accurate predictions for all subjects and follow‐up times can be challenging, especially when considering multiple longitudinal outcomes. In this work, we use the concept of super learning and avoid selecting a single model. In particular, we specify a weighted combination of the dynamic predictions calculated from a library of joint models with different specifications. The weights are selected to optimize a predictive accuracy metric using V‐fold cross‐validation. We use as predictive accuracy measures the expected quadratic prediction error and the expected predictive cross‐entropy. In a simulation study, we found that the super learning approach produces results very similar to the Oracle model, which was the model with the best performance in the test datasets. All proposed methodology is implemented in the freely available R package JMbayes2.

2023-09-07 — The Causal Roadmap and Simulations to Improve the Rigor and Reproducibility of Real-data Applications

Authors: Nerissa Nance, Maya Petersen, Mark van der Laan, Laura B. Balzer
Year: 2023
Publication Date: 2023-09-07
Venue: Epidemiology
DOI: 10.1097/EDE.0000000000001773
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
The Causal Roadmap outlines a systematic approach to asking and answering questions of cause and effect: define the quantity of interest, evaluate needed assumptions, conduct statistical estimation, and carefully interpret results. To protect research integrity, it is essential that the algorithm for statistical estimation and inference be prespecified prior to conducting any effectiveness analyses. However, it is often unclear which algorithm will perform optimally for the real-data application. Instead, there is a temptation to simply implement one’s favorite algorithm, recycling prior code or relying on the default settings of a computing package. Here, we call for the use of simulations that realistically reflect the application, including key characteristics such as strong confounding and dependent or missing outcomes, to objectively compare candidate estimators and facilitate full specification of the statistical analysis plan. Such simulations are informed by the Causal Roadmap and conducted after data collection but prior to effect estimation. We illustrate with two worked examples. First, in an observational longitudinal study, we use outcome-blind simulations to inform nuisance parameter estimation and variance estimation for longitudinal targeted minimum loss-based estimation. Second, in a cluster randomized trial with missing outcomes, we use treatment-blind simulations to examine type-I error control in two-stage targeted minimum loss-based estimation. In both examples, realistic simulations empower us to prespecify an estimation approach with strong expected finite sample performance, and also produce quality-controlled computing code for the actual analysis. Together, this process helps to improve the rigor and reproducibility of our research.

2023-09-07 — Bayesian-Optimization-Based Long Short-Term Memory (LSTM) Super Learner Approach for Modeling Long-Term Electricity Consumption

Authors: Salma Hamad Almuhaini, Nahid Sultana
Year: 2023
Publication Date: 2023-09-07
Venue: Sustainability
DOI: 10.3390/su151813409
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
This study utilized different methods, namely classical multiple linear regression (MLR), statistical approach exponential smoothing (EXPS), and deep learning algorithm long short-term memory (LSTM) to forecast long-term electricity consumption in the Kingdom of Saudi Arabia. The originality of this research lies in (1) specifying exogenous variables that significantly affect electrical consumption; (2) utilizing the Bayesian optimization algorithm (BOA) to develop individual super learner BOA-LSTM models for forecasting the residential and total long-term electric energy consumption; (3) measuring forecasting performances of the proposed super learner models with classical and statistical models, viz. MLR and EXPS, by employing the broadly used evaluation measures regarding the computational efficiency, model accuracy, and generalizability; and finally (4) estimating forthcoming yearly electric energy consumption and validation. Population, gross domestic products, imports, and refined oil products significantly impact residential and total annual electricity consumption. The coefficient of determination (R2) for all the proposed models is greater than 0.93, representing an outstanding fitting of the models with historical data. Moreover, the developed BOA-LSTM models have the best performance with R2>0.99, enhancing the predicting accuracy (Mean Absolute Percentage Error (MAPE)) by 59.6% and 54.8% compared to the MLR and EXPS models, respectively, of total annual electricity consumption. This forecasting accuracy in residential electricity consumption for the BOA-LSTM model is improved by 62.7% and 68.9% compared to the MLR and EXPS models. This study achieved a higher accuracy and consistency of the proposed super learner model in long-term electricity forecasting, which can be utilized in energy strategy management to secure the sustainability of electric energy.

2023-09-02 — RETRACTED: Software effort estimation using stacked ensembled techniques and proposed stacking ensemble using principal component regression as super learner

Authors: A. G. Priya Varshini, K. Anitha Kumari
Year: 2023
Publication Date: 2023-09-02
Venue: Journal of Intelligent & Fuzzy Systems
DOI: 10.3233/JIFS-230676
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
This article has been retracted. A retraction notice can be found at https://doi.org/10.3233/JIFS-219434.

2023-09-01 — Who is most at risk of dying if infected with SARS-CoV-2? A mortality risk factor analysis using machine learning of patients with COVID-19 over time: a large population-based cohort study in Mexico

Authors: Lauren D. Liao, Alan E. Hubbard, J. P. Gutiérrez, Arturo Juárez-Flores, Kendall Kikkawa, Ronit Gupta, Yana Yarmolich, Iván de Jesús Ascencio-Montiel, S. Bertozzi
Year: 2023
Publication Date: 2023-09-01
Venue: BMJ Open
DOI: 10.1136/bmjopen-2023-072436
Link: Semantic Scholar
Matched Keywords: super learner, targeted maximum likelihood estimation

Abstract:
Objective COVID-19 would kill fewer people if health programmes can predict who is at higher risk of mortality because resources can be targeted to protect those people from infection. We predict mortality in a very large population in Mexico with machine learning using demographic variables and pre-existing conditions. Design Cohort study. Setting March 2020 to November 2021 in Mexico, nationally represented. Participants 1.4 million laboratory-confirmed patients with COVID-19 in Mexico at or over 20 years of age. Primary and secondary outcome measures Analysis is performed on data from March 2020 to November 2021 and over three phases: (1) from March to October in 2020, (2) from November 2020 to March 2021 and (3) from April to November 2021. We predict mortality using an ensemble machine learning method, super learner, and independently estimate the adjusted mortality relative risk of each pre-existing condition using targeted maximum likelihood estimation. Results Super learner fit has a high predictive performance (C-statistic: 0.907), where age is the most predictive factor for mortality. After adjusting for demographic factors, renal disease, hypertension, diabetes and obesity are the most impactful pre-existing conditions. Phase analysis shows that the adjusted mortality risk decreased over time while relative risk increased for each pre-existing condition. Conclusions While age is the most important predictor of mortality, younger individuals with hypertension, diabetes and obesity are at comparable mortality risk as individuals who are 20 years older without any of the three conditions. Our model can be continuously updated to identify individuals who should most be protected against infection as the pandemic evolves.

2023-09-01 — Multi-class Classification of Ionospheric Scintillations Using SMOTE-Super Learner Ensemble Technique

Authors: I. Srivani, M. Sridhar, K.C.T. Swamy, D. Venkata Ratnam
Year: 2023
Publication Date: 2023-09-01
Venue: Advances in Space Research
DOI: 10.1016/j.asr.2023.09.039
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2023-08-27 — Science Education for Preschoolers Based on Superstar Learning Access Course Blended Teaching Practice

Authors: Yuching Li
Year: 2023
Publication Date: 2023-08-27
Venue: World Journal of Education and Humanities
DOI: 10.22158/wjeh.v5n3p143
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
The rapid development of information technology is changing and even subverting the traditional mode of education and teaching practice. Mixed teaching is an effective way to reform the teaching mode, realize resource sharing and improve the quality of talent training. Based on the concept of hybrid teaching definition, this paper to master learning theory and active learning theory as the theoretical basis, based on super learning online platform of preschool children science education course hybrid teaching practice, further discussed before class, class, after class three stages of teachers and students, in order to provide the experience of mixed teaching reform.

2023-08-23 — Combining Super Learner with high‐dimensional propensity score to improve confounding adjustment: A real‐world application in chronic lymphocytic leukemia

Authors: Neil Dhopeshwarkar, Wei Yang, S. Hennessy, Joanna M. Rhodes, A. Cuker, Charles E. Leonard
Year: 2023
Publication Date: 2023-08-23
Venue: Pharmacoepidemiology and Drug Safety
DOI: 10.1002/pds.5678
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
High‐dimensional propensity score (hdPS) is a semiautomated method that leverages a vast number of covariates available in healthcare databases to improve confounding adjustment. A novel combined Super Learner (SL)‐hdPS approach was proposed to assist with selecting the number of covariates for propensity score inclusion, and was found in plasmode simulation studies to improve bias reduction and precision compared to hdPS alone. However, the approach has not been examined in the applied setting.

2023-08-19 — Association of pulmonary artery catheter with in-hospital outcomes after cardiac surgery in the United States: National Inpatient Sample 1999–2019

Authors: Hind A Beydoun, M. Beydoun, Shaker M Eid, A. Zonderman
Year: 2023
Publication Date: 2023-08-19
Venue: Scientific Reports
DOI: 10.1038/s41598-023-40615-6
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
To examine associations of pulmonary artery catheter (PAC) use with in-hospital death and hospital length of stay (days) overall and within subgroups of hospitalized cardiac surgery patients. Secondary analyses of 1999–2019 National Inpatient Sample data were performed using 969,034 records (68% male, mean age: 65 years) representing adult cardiac surgery patients in the United States. A subgroup of 323,929 records corresponded to patients with congestive heart failure, pulmonary hypertension, mitral/tricuspid valve disease and/or combined surgeries. We evaluated PAC in relation to clinical outcomes using regression and targeted maximum likelihood estimation (TMLE). Hospitalized cardiac surgery patients experienced more in-hospital deaths and longer stays if they had ≥ 1 subgroup characteristics. For risk-adjusted models, in-hospital deaths were similar among recipients and non-recipients of PAC (odds ratio [OR] 1.04, 95% confidence interval [CI] 0.96, 1.12), although PAC was associated with more in-hospital deaths among the subgroup with congestive heart failure (OR 1.14, 95% CI 1.03, 1.26). PAC recipients experienced shorter stays than non-recipients (β = − 0.40, 95% CI − 0.64, − 0.15), with variations by subgroup. We obtained comparable results using TMLE. In this retrospective cohort study, PAC was associated with shorter stays and similar in-hospital death rates among cardiac surgery patients. Worse clinical outcomes associated with PAC were observed only among patients with congestive heart failure. Prospective cohort studies and randomized controlled trials are needed to confirm and extend these preliminary findings.

2023-08-08 — SLEM: Machine Learning for Path Modeling and Causal Inference with Super Learner Equation Modeling

Authors: M. Vowels
Year: 2023
Publication Date: 2023-08-08
Venue: arXiv.org
DOI: 10.48550/arXiv.2308.04365
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Causal inference is a crucial goal of science, enabling researchers to arrive at meaningful conclusions regarding the predictions of hypothetical interventions using observational data. Path models, Structural Equation Models (SEMs), and, more generally, Directed Acyclic Graphs (DAGs), provide a means to unambiguously specify assumptions regarding the causal structure underlying a phenomenon. Unlike DAGs, which make very few assumptions about the functional and parametric form, SEM assumes linearity. This can result in functional misspecification which prevents researchers from undertaking reliable effect size estimation. In contrast, we propose Super Learner Equation Modeling, a path modeling technique integrating machine learning Super Learner ensembles. We empirically demonstrate its ability to provide consistent and unbiased estimates of causal effects, its competitive performance for linear models when compared with SEM, and highlight its superiority over SEM when dealing with non-linear relationships. We provide open-source code, and a tutorial notebook with example usage, accentuating the easy-to-use nature of the method.

2023-08-08 — An Optimized Stacking Ensemble Learning Model Using 3-Pyramids Technique for the 2006 CHF Groeneveld Look Table Prediction

Authors: M. Djeddou, J. Dallal, Aouatef Hellal, Ibrahim A. Hameed, I. Loukam, M. F. Kabir, M. Bouhicha
Year: 2023
Publication Date: 2023-08-08
Venue: 2023 International Conference on Applied Mathematics & Computer Science (ICAMCS)
DOI: 10.1109/ICAMCS59110.2023.00022
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Typically, critical heat flux (CHF) look-up tables constructed based on physical experiments have limited data. Using available experimental-based CHF look-up tables as training data, machine learning techniques can be applied to construct CHF prediction models and, thus, produce CHF look-up tables reporting estimated CHF values under a much wider variety of conditions. This study proposes an effective prediction approach based on stacking ensemble learning and a new optimization technique, namely 3-pyramids, to get reliable CHF look-up prediction results. The approach is divided into three parts that include (1) using multiple prediction models constructed using different machine learning techniques, (2) combining the prediction results using a stacking strategy to generate the final prediction results, and (3) optimizing an improved super learner model to improve the prediction model's capabilities using 3-pyramids optimization technique. The effectiveness of these models was evaluated using the coefficient of determination (R2), mean absolute error (MAE), and root-mean-squared (RMSE). The results show that the proposed model has high accuracy and that applying the 3-pyramids optimization further improved this accuracy and significantly reduced the model execution time.

2023-08-04 — Predicting Long COVID in the National COVID Cohort Collaborative Using Super Learner

Authors: Zachary Butzin-Dozier, Yunwen Ji, Haodong Li, Jeremy Coyle, Junming Shi, Rachael V. Phillips, Andrew N. Mertens, R. Pirracchio, M. J. Laan, Rena C. Patel, J. Colford, A. Hubbard, -. N. Consortium
Year: 2023
Publication Date: 2023-08-04
Venue: medRxiv
DOI: 10.1101/2023.07.27.23293272
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Post-acute Sequelae of COVID-19 (PASC), also known as Long COVID, is a broad grouping of a range of long-term symptoms following acute COVID-19 infection. An understanding of characteristics that are predictive of future PASC is valuable, as this can inform the identification of high-risk individuals and future preventative efforts. However, current knowledge regarding PASC risk factors is limited. Using a sample of 55,257 participants from the National COVID Cohort Collaborative, as part of the NIH Long COVID Computational Challenge, we sought to predict individual risk of PASC diagnosis from a curated set of clinically informed covariates. We predicted individual PASC status, given covariate information, using Super Learner (an ensemble machine learning algorithm also known as stacking) to learn the optimal, AUC-maximizing combination of gradient boosting and random forest algorithms. We were able to predict individual PASC diagnoses accurately (AUC 0.947). Temporally, we found that baseline characteristics were most predictive of future PASC diagnosis, compared with characteristics immediately before, during, or after COVID-19 infection. This finding supports the hypothesis that clinicians may be able to accurately assess the risk of PASC in patients prior to acute COVID diagnosis, which could improve early interventions and preventive care. We found that medical utilization, demographics and anthropometry, and respiratory factors were most predictive of PASC diagnosis. This highlights the importance of respiratory characteristics in PASC risk assessment. The methods outlined here provide an open-source, applied example of using Super Learner to predict PASC status using electronic health record data, which can be replicated across a variety of settings.

2023-08-02 — Parametric and nonparametric propensity score estimation in multilevel observational studies

Authors: Marie Salditt, S. Nestler
Year: 2023
Publication Date: 2023-08-02
Venue: Statistics in Medicine
DOI: 10.1002/sim.9852
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
There has been growing interest in using nonparametric machine learning approaches for propensity score estimation in order to foster robustness against misspecification of the propensity score model. However, the vast majority of studies focused on single‐level data settings, and research on nonparametric propensity score estimation in clustered data settings is scarce. In this article, we extend existing research by describing a general algorithm for incorporating random effects into a machine learning model, which we implemented for generalized boosted modeling (GBM). In a simulation study, we investigated the performance of logistic regression, GBM, and Bayesian additive regression trees for inverse probability of treatment weighting (IPW) when the data are clustered, the treatment exposure mechanism is nonlinear, and unmeasured cluster‐level confounding is present. For each approach, we compared fixed and random effects propensity score models to single‐level models and evaluated their use in both marginal and clustered IPW. We additionally investigated the performance of the standard Super Learner and the balance Super Learner. The results showed that when there was no unmeasured confounding, logistic regression resulted in moderate bias in both marginal and clustered IPW, whereas the nonparametric approaches were unbiased. In presence of cluster‐level confounding, fixed and random effects models greatly reduced bias compared to single‐level models in marginal IPW, with fixed effects GBM and fixed effects logistic regression performing best. Finally, clustered IPW was overall preferable to marginal IPW and the balance Super Learner outperformed the standard Super Learner, though neither worked as well as their best candidate model.

2023-08-01 — Super learner ensemble model: A novel approach for predicting monthly copper price in future

Authors: Jue Zhao, Shahab Hosseini, Qinyang Chen, D. Jahed Armaghani
Year: 2023
Publication Date: 2023-08-01
Venue: Resources policy
DOI: 10.1016/j.resourpol.2023.103903
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2023-07-19 — Persistent poverty and child dental caries: time-varying exposure analysis

Authors: Y. Matsuyama, A. Isumi, S. Doi, T. Fujiwara
Year: 2023
Publication Date: 2023-07-19
Venue: Journal of Epidemiology and Community Health
DOI: 10.1136/jech-2022-220073
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Background This study investigated the cumulative impact of persistent poverty on dental caries among elementary schoolchildren in Japan. Methods Data were derived from four-wave longitudinal data of children in all public elementary schools in Adachi City, Tokyo, Japan, from 2015 to 2020 (n=4291, response rate: 80.1%–83.8%). Poverty status, defined as annual household income <JPY3 million, material deprivation or payment difficulties for lifeline utilities, was assessed by caregiver questionnaires when the children were in the first, second, fourth and sixth grades. School dentists assessed dental caries. We estimated the difference in the number of primary and permanent teeth with incidences of dental caries from second to sixth grade by persistent poverty and never having experienced poverty. Targeted maximum likelihood estimation was used to consider baseline and time-varying confounders. Results Children with persistent poverty experienced more dental caries (mean: 3.81, SD: 3.73) than children who had never experienced poverty (mean: 2.39, SD: 3.27). After controlling for confounders, being in persistent poverty was significantly associated with having more dental caries than never being in poverty (mean difference: 1.54, 95% CI 0.60, 2.48). The magnitude of the association was greater than that of poverty assessed at first grade only (mean difference: 0.75, 95% CI 0.35, 1.16) or experience of poverty at any of the four waves (mean difference: 0.69, 95% CI 0.39, 0.99). Conclusion The cumulative impact of persistent poverty could be larger than the poverty assessed at a single time point.

2023-07-01 — Poster 362: Identifying Racial Disparity in Utilization and Outcomes of Hip Arthroscopy using Machine Learning

Authors: Yining Lu, Erick M. Marigi, Kareme D. Alder, John P. Mickley, Christopher L. Camp, B. Levy, A. Krych, K. Okoroha
Year: 2023
Publication Date: 2023-07-01
Venue: Orthopaedic Journal of Sports Medicine
DOI: 10.1177/2325967123S00325
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Objectives: Background: Arthroscopic diagnosis and treatment of femoroacetabular pathology has been increasingly used in the past thirty years with interventions resulting in improved hip function and ultimate delay of hip arthroplasty in a minimally invasive manner. Unfortunately, previous investigations have observed decreased rates of access, utilization of, and outcomes following orthopedic interventions such as hip arthroplasty in underrepresented patients. The purpose of this study is to examine racial differences in procedural rates, outcomes, and complications in patients undergoing hip arthroscopy. Methods: The State Ambulatory Surgery and Services Database (SASD) and State Emergency Department Database (SEDD) of New York were queried for patients undergoing hip arthroscopy from 2011 to 2017. The primary outcomes investigated were utilization over time, total charges billed per encounter, 90-day emergency department visits, and revision hip arthroscopy. Patients were stratified into White and non-White race, and intergroup differences were evaluated with descriptive statistics. Subgroup analysis was performed with linear mixed-effects models to identify significant interactions between race and individual variables that contributed to any differences in the outcomes of interest. Temporal trends in utilization of hip arthroscopy and concomitant procedures between the two groups were analyzed with Poisson regression modeling. Finally, targeted maximum likelihood estimation (TMLE) was performed to provide nonparametric estimates of the specific differences in the outcomes studied using machine learning ensembles while controlling for patient risk factors. Results: A total of 9,745 patients underwent hip arthroscopy during the study period, with 1,081 patients of non-White race (11.9%). Results of Poisson regression demonstrated an annual increase of 1.11 in the incidence rate of hip arthroscopy among White patients, compared to 1.03 for non-White patients (p<0.001), with this disparity projected to increase by 2040. Based on TMLE utilizing an ensemble of machine learning models, non-White patients were significantly more likely to incur higher costs (OR: 1.30, 95% CI: 1.24-1.37, p<0.001) and visit the emergency department within 90-days (OR: 1.09, 95% CI: 1.01, 1.18, p=0.05), but had negligible differences in reoperation rates at 90 days to 2 years (OR: 1.13, 95% CI: 0.78-1.63, p=0.53). Subgroup analysis identified higher likelihood for 90-day emergency department admissions among non-White patients compared to White patients, which were significantly compounded by Medicare insurance (OR: 2.95, 95% CI 1.46-5.95, p=0.002), median income in the lowest quartile (OR: 1.84, 95% CI: 1.2-2.61, p=0.012), and residence in low-income neighborhoods (OR: 2.05, 95% CI: 1.31-3.2, p=0.006). Subgroup analysis for charges billed and reoperation did not identify significant findings. Conclusions: Hip arthroscopy remains an increasingly utilized surgical technique for the treatment of a myriad of hip disorders. Unfortunately, racial disparities exist and are worsening over time. Irrespective of insurance status, non-white patients undergo hip arthroscopy at a lower rate, incur higher costs, and more frequently experience unexpected returns to the emergency department. Improved initiatives to improve the disparity in access to and outcomes following hip arthroscopy must be addressed to further its utility for all patients.

2023-07-01 — Causal Evaluation of Post-Marketing Drugs for Drug-induced Liver Injury from Electronic Health Records*

Authors: Yu Wang, Jing Ma, Shuang Ma, Jiaqi Wang, Jingsong Li
Year: 2023
Publication Date: 2023-07-01
Venue: Annual International Conference of the IEEE Engineering in Medicine and Biology Society
DOI: 10.1109/EMBC40787.2023.10340721
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Drug-induced liver injury (DILI) is one of the most common and serious adverse drug reactions that can lead to acute liver failure and death. Detection of DILI and causal estimation of drug-hepatotoxicity association are of great importance for patient safety. This paper proposes a framework for causal estimation of post-marketing drugs for DILI from real-world electronic health record (EHR) data. Randomized clinical trials were replicated at scale by automatically generating different user and non-user cohorts for each potential drug, and average treatment effects (ATEs) of drugs were estimated using targeted maximum likelihood estimation. Ten years of real-world EHRs were used to validate the framework. Of all 1199 single-ingredient drugs analyzed, 7 novel and 7 known drug-hepatotoxicity associations were found to be causal.

2023-06-30 — Penerapan Teknik Super Learner dalam Pemodelan Faktor yang Memengaruhi Rekomendasi Operator Seluler

Authors: Aulia Fitriyani, Bagus Sartono, Septian Rahardiantoro
Year: 2023
Publication Date: 2023-06-30
Venue: Xplore Journal of Statistics
DOI: 10.29244/xplore.v12i2.335
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Telecommunication refers to the exchange of information over long distances. Indonesia is one of the countries with the highest number of mobile network operators worldwide. This situation motivated Mobile Operator A to conduct a survey investigating store attendants’ recommendations to customers regarding the use of Operator A’s services. Classification methods can be applied to identify which operator a store attendant is likely to recommend based on several influencing factors. In this study, the super learner method is employed to integrate multiple base learners into a single optimized predictive model. The base learners used include random forest, bagging, and logistic regression. The resulting super learner model achieves an accuracy of 88.11% and an AUC of 0.9083. The most influential factor driving store attendants’ recommendations is whether Operator A is the best-selling internet provider in the respective store. Beyond individual effects, several interactions between pairs of explanatory variables are also found to play a significant role.

2023-06-30 — Kajian Algoritme Super Learner sebagai Metode Ensemble dalam Kasus Klasifikasi

Authors: F. Nadeak, Bagus Sartono, Anwar Fitrianto
Year: 2023
Publication Date: 2023-06-30
Venue: Xplore Journal of Statistics
DOI: 10.29244/xplore.v12i2.283
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Classification is a statistical approach used when the response variable is categorical. The classification process generally consists of two phases: model training and model testing. The Super Learner is an ensemble method that integrates multiple candidate algorithms into a single predictive model by using V-fold cross-validation to determine the optimal weighted combination of base learners. Although numerous studies in the Department of Statistics at IPB University have applied various classification techniques and achieved satisfactory average accuracy, misclassification remains an issue that could potentially be reduced through model optimization. This study investigates whether the Super Learner ensemble can improve classification accuracy relative to single-model approaches previously applied. In addition, the study examines the characteristics of the resulting Super Learner models and evaluates the conditions under which performance gains are most pronounced.

2023-06-15 — The causal effect of family physician program on the prevalence, screening, awareness, treatment, and control of hypertension and diabetes mellitus in an Eastern Mediterranean Region: a causal difference-in-differences analysis

Authors: N. Mohammadi, A. Alizadeh, S. Moghaddam, E. Ghasemi, N. Ahmadi, M. Yaseri, N. Rezaei, M. Mansournia
Year: 2023
Publication Date: 2023-06-15
Venue: BMC Public Health
DOI: 10.1186/s12889-023-16074-z
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Background Hypertension (HTN) and diabetes mellitus (DM) as part of non-communicable diseases are among the most common causes of death worldwide, especially in the WHO’s Eastern Mediterranean Region (EMR). The family physician program (FPP) proposed by WHO is a health strategy to provide primary health care and improve the community’s awareness of non-communicable diseases. Since there was no clear focus on the causal effect of FPP on the prevalence, screening, and awareness of HTN and DM, the primary objective of this study is to determine the causal effect of FPP on these factors in Iran, which is an EMR country. Methods We conducted a repeated cross-sectional design based on two independent surveys of 42,776 adult participants in 2011 and 2016, of which 2301 individuals were selected from two regions where the family physician program was implemented (FPP) and where it wasn't (non-FPP). We used an Inverse Probability Weighting difference-in-differences and Targeted Maximum Likelihood Estimation analysis to estimate the average treatment effects on treated (ATT) using R version 4.1.1. Results The FPP implementation increased the screening (ATT = 36%, 95% CI: (27%, 45%), P -value < 0.001) and the control of hypertension (ATT = 26%, 95% CI: (1%, 52%), P -value = 0.03) based on 2017 ACC/AHA guidelines that these results were in keeping with JNC7. There was no causal effect in other indexes, such as prevalence, awareness, and treatment. The DM screening (ATT = 20%, 95% CI: (6%, 34%), P -value = 0.004) and awareness (ATT = 14%, 95% CI: (1%, 27%), P -value = 0.042) were significantly increased among FPP administered region. However, the treatment of HTN decreased (ATT = -32%, 95% CI: (-59%, -5%), P -value = 0.012). Conclusion This study has identified some limitations related to the FPP in managing HTN and DM, and presented solutions to solve them in two general categories. Thus, we recommend that the FPP be revised before the generalization of the program to other parts of Iran.

2023-06-14 — Kernel Debiased Plug-in Estimation: Simultaneous, Automated Debiasing without Influence Functions for Many Target Parameters

Authors: Brian Cho, Yaroslav Mukhin, Kyra Gan, Ivana Malenica
Year: 2023
Publication Date: 2023-06-14
Venue: International Conference on Machine Learning
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
When estimating target parameters in nonparametric models with nuisance parameters, substituting the unknown nuisances with nonparametric estimators can introduce "plug-in bias." Traditional methods addressing this suboptimal bias-variance trade-off rely on the influence function (IF) of the target parameter. When estimating multiple target parameters, these methods require debiasing the nuisance parameter multiple times using the corresponding IFs, which poses analytical and computational challenges. In this work, we leverage the targeted maximum likelihood estimation (TMLE) framework to propose a novel method named kernel debiased plug-in estimation (KDPE). KDPE refines an initial estimate through regularized likelihood maximization steps, employing a nonparametric model based on reproducing kernel Hilbert spaces. We show that KDPE: (i) simultaneously debiases all pathwise differentiable target parameters that satisfy our regularity conditions, (ii) does not require the IF for implementation, and (iii) remains computationally tractable. We numerically illustrate the use of KDPE and validate our theoretical results.

2023-06-13 — How does women’s empowerment relate to antenatal care attendance? A cross-sectional analysis among rural women in Bangladesh

Authors: Solis Winters, H. Pitchik, Fahmida Akter, F. Yeasmin, Tania Jahir, T. Huda, Md. Mahbubur Rahman, P. Winch, S. Luby, L. Fernald
Year: 2023
Publication Date: 2023-06-13
Venue: BMC Pregnancy and Childbirth
DOI: 10.1186/s12884-023-05737-9
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Background In South Asia, roughly half of women attend at least four antenatal care visits with skilled health personnel, the minimum number recommended by the World Health Organization for optimal birth outcomes. A much greater proportion of women attend at least one antenatal care visit, suggesting that a key challenge is ensuring that women initiate antenatal care early in pregnancy and continue to attend after their first visit. One critical barrier to antenatal care attendance may be that women do not have sufficient power in their relationships, households, or communities to attend antenatal care when they want to. The main goals of this paper were to 1) understand the potential effects of intervening on direct measures of women’s empowerment—including household decision making, freedom of movement, and control over assets—on antenatal care attendance in a rural population of women in Bangladesh, and 2) examine whether differential associations exist across strata of socioeconomic status. Methods We analyzed data on 1609 mothers with children under 24 months old in rural Bangladesh and employed targeted maximum likelihood estimation with ensemble machine learning to estimate population average treatment effects. Results Greater women’s empowerment was associated with an increased number of antenatal care visits. Specifically, among women who attended at least one antenatal care visit, having high empowerment was associated with a greater probability of ≥ 4 antenatal care visits, both in comparison to low empowerment (15.2 pp, 95% CI: 6.0, 24.4) and medium empowerment (9.1 pp, 95% CI: 2.5, 15.7). The subscales of women’s empowerment driving the associations were women’s decision-making power and control over assets. We found that greater women’s empowerment is associated with more antenatal care visits regardless of socioeconomic status. Conclusions Empowerment-based interventions, particularly those targeting women’s involvement in household decisions and/or facilitating greater control over assets, may be a valuable strategy for increasing antenatal care attendance. Trial registration ClinicalTrials.gov Identifier: NCT04111016, Date First Registered: 01/10/2019.

2023-06-05 — The Development of an AI-Based Network Security Algorithm for an IoT Healthcare Platform

Authors: Keith Lungile Ncube Mainford
Year: 2023
Publication Date: 2023-06-05
Venue: International Journal of Science and Research (IJSR)
DOI: 10.21275/sr23602005041
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
: The Internet of things is made up of all IPv6-capable hardware that is linked to and communicates with one another via the Internet. Our civilization uses this common phenomenon on a daily basis. Two of the main obstacles in large-scale IoT installations are data privacy and security. This is especially true for important applications like Industry 4.0 and e-healthcare. Securing the IoT-cloud ecosystem for healthcare data is one of the hardest and tough issues of today. The IoT Cloud infrastructure is particularly susceptible to flaws and attacks because of the numerous sensors utilized to produce enormous amounts of data. This can make the network less secure. The finest technology for healthcare applications is artificial intelligence (AI), as it provides the best method for enhancing data security and reliability. The IoT cloud framework already uses a number of AI-based security mechanisms. Significant flaws in existing algorithms include complicated algorithm design and ineffective data processing. Additionally, they are unsuitable for analyzing unstructured data, which raises the price of IoT sensors. In order to improve the security and privacy of healthcare data stored in IoT clouds, this study introduces Probabilistic Super Learning (PSL) and Random Hashing (RH), two AI-based intelligence feature learning mechanisms. This research also employs the suggested learning approach to reduce the price of IoT sensors. The initial assault is discovered using this training model. The attack's properties are then changed in order to learn how attacks operate. Additionally, the data matrix's hash values are used to generate the random key. Elliptic Curve Cryptography is linked with this method for data security. The upgraded ECC-RH technique uses randomly generated hash keys to encrypt and decode data. Performance evaluation compares and validates the outcomes of various methodologies. A secure network layer is provided for IoT apps connected across 5G networks and beyond in the context of the final analysis of bio-inspired algorithms.

2023-06-01 — P-083 Associations of sperm epigenetic aging with semen parameters among men from the U.S. general population

Authors: R. Pilsner, S. Sawant, E. Houle, O. Oladele
Year: 2023
Publication Date: 2023-06-01
Venue: Human Reproduction
DOI: 10.1093/humrep/dead093.448
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Is sperm epigenetic aging (SEA) associated with semen parameters among men in the general population? While SEA was not associated with general semen parameters, advanced SEA was positively associated with sperm head defects such as length, perimeter, and pyriforms. We have previously shown that advanced SEA was strongly associated with longer time-to-pregnancy among couples in the general population. A population-based prospective cohort study of couples discontinuing contraception to become pregnant recruited from 16 US counties from 2005 to 2009. Sperm DNA methylation from 379 semen samples were assessed via Illumina EPIC Array and SEA was estimated using Super Learner, an ensemble machine learning algorithm. Linear regression models were employed to examine the associations between semen parameters and SEA adjusting for male age and current smoking. None of the general semen characteristics such as count, concentration, motility or morphology were associated with SEA. However, several sperm head parameters were positively associated with SEA including length (β = 3.6, 95% confidence internal (CI): 1.01 - 6.23; p = 0.007); perimeter (β = 4.04, 95% CI: 0.1 – 0.05; p = 0.045) and pyriforms (β = 0.29, 95% CI: 0.1-0.49; p = 0.003). SEA was also inversely related to sperm elongation factor (β = -2.9, 95% CI: -4.8 - -1.1; p = 0.002). This prospective cohort study consisted primarily of Caucasian men and women and thus large diverse cohorts are necessary to confirm the associations between SEA and sperm head defects in other races/ethnicities. These data suggest that advanced sperm epigenetic aging may be related to improper sperm head condensation during spermatogenesis. N/A

2023-06-01 — Impact of life-sustaining therapies on critically ill patients with cancer.

Authors: K. Shah, João Matos, T. Struja, J. Svoboda, L. Celi, C. Sauer
Year: 2023
Publication Date: 2023-06-01
Venue: Journal of Clinical Oncology
DOI: 10.1200/jco.2023.41.16_suppl.e13590
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
e13590 Background: Advances in diagnosis and treatment have led to large reductions in mortality rates for patients with cancer, resulting in a steady increase in patients admitted to intensive care units (ICUs). However, there is conflicting evidence supporting the benefit of common life-sustaining therapies for critically ill patients, within this cohort. We examined differences in treatment effects of life-sustaining therapies between critically ill septic patients with and without cancer. Methods: Adults aged 18+ with a principal cancer diagnosis from 2008 - 2019, admitted to the ICU for sepsis, were extracted from publicly accessible ICU databases: Medical Information Mart for Intensive Care IV (MIMIC-IV) and the eICU Collaborative Research Database (eICU-CRD). Logistic regression estimated the likelihood of receiving mechanical ventilation (MV), renal replacement therapy (RRT) or vasopressors (VP) during ICU admission. Additionally targeted maximum likelihood estimation (TMLE) models estimated average treatment effects of MV, VP, and RRT on mortality. As TMLE uses machine learning models, it accommodates large numbers of covariates with complex, non-linear relationships. The method outputs Average treatment effect (ATE), an odds ratio representing the mean difference in outcomes in a hypothetical world in which all patients received treatment, compared to a hypothetical world in which no patients received treatment. Models were adjusted for age, sex, ethnicity, Charlson Comorbidity Index, SOFA score, hypertension, heart failure, asthma, COPD, CKD, and code status at admission. Results: A total of 7,120 adults met inclusion criteria. Septic patients with cancer did not have a significantly different likelihood of receiving MV (aOR [95%CI], 0.94 [0.6 - 1.46]), RRT (0.79, [0.33 - 1.93]), or VP (1.09, [0.74, 1.9]) than septic patients without cancer. Among patients with low SOFA scores, those with cancer were more likely than those without cancer to benefit from RRT. Otherwise, there was no statistically significant difference in mortality with the use of MV, RRT, and VP between patients with and without cancer (p>0.05). Conclusions: To our knowledge this is the first study to utilize contemporary, nationwide critical care data to establish a causal relationship between mortality and the use of common life-sustaining therapies for patients with cancer. Our study highlights the tremendous advances in cancer treatment over the last decade, leading to similar effects of common critical care interventions on mortality, regardless of cancer diagnosis. [Table: see text]

2023-06-01 — Ceiling Effect of the Combined Norwegian and Danish Knee Ligament Registers Limits Anterior Cruciate Ligament Reconstruction Outcome Prediction

Authors: R. K. Martin, S. Wastvedt, Ayoosh Pareek, A. Persson, H. Visnes, A. Fenstad, G. Moatshe, J. Wolfson, M. Lind, L. Engebretsen
Year: 2023
Publication Date: 2023-06-01
Venue: American Journal of Sports Medicine
DOI: 10.1177/03635465231177905
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Background: Clinical tools based on machine learning analysis now exist for outcome prediction after primary anterior cruciate ligament reconstruction (ACLR). Relying partly on data volume, the general principle is that more data may lead to improved model accuracy. Purpose/Hypothesis: The purpose was to apply machine learning to a combined data set from the Norwegian and Danish knee ligament registers (NKLR and DKRR, respectively), with the aim of producing an algorithm that can predict revision surgery with improved accuracy relative to a previously published model developed using only the NKLR. The hypothesis was that the additional patient data would result in an algorithm that is more accurate. Study Design: Cohort study; Level of evidence, 3. Methods: Machine learning analysis was performed on combined data from the NKLR and DKRR. The primary outcome was the probability of revision ACLR within 1, 2, and 5 years. Data were split randomly into training sets (75%) and test sets (25%). There were 4 machine learning models examined: Cox lasso, random survival forest, gradient boosting, and super learner. Concordance and calibration were calculated for all 4 models. Results: The data set included 62,955 patients in which 5% underwent a revision surgical procedure with a mean follow-up of 7.6 ± 4.5 years. The 3 nonparametric models (random survival forest, gradient boosting, and super learner) performed best, demonstrating moderate concordance (0.67 [95% CI, 0.64-0.70]), and were well calibrated at 1 and 2 years. Model performance was similar to that of the previously published model (NKLR-only model: concordance, 0.67-0.69; well calibrated). Conclusion: Machine learning analysis of the combined NKLR and DKRR enabled prediction of the revision ACLR risk with moderate accuracy. However, the resulting algorithms were less user-friendly and did not demonstrate superior accuracy in comparison with the previously developed model based on patients from the NKLR alone, despite the analysis of nearly 63,000 patients. This ceiling effect suggests that simply adding more patients to current national knee ligament registers is unlikely to improve predictive capability and may prompt future changes to increase variable inclusion.

2023-06-01 — #4356 DYNAMIC EFFECT OF PROTEINURIA ON ADVERSE KIDNEY OUTCOMES WITH TARGETED MAXIMUM LIKELIHOOD ESTIMATION METHOD: RESULT FROM THE KNOW-CKD STUDY

Authors: Y. Oh, K. Oh, Wookyung Chung, J. Jung
Year: 2023
Publication Date: 2023-06-01
Venue: Nephrology, Dialysis and Transplantation
DOI: 10.1093/ndt/gfad063c_4356
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, targeted minimum loss based estimation, tmle

Abstract:
Proteinuria is a risk factor for the progression of chronic kidney disease (CKD), but its causal inference may be challenging due to time varying exposure and confounding, the changing its degree, and loss to follow up. A total of 1,223 patients enrolled as part of The Korean Cohort Study for Outcome in Patients With Chronic Kidney Disease (KNOW-CKD) were analyzed at 3 observational points (baseline, 3-year; early, and 7-year; late) at which spot urine protein creatinine ratio (UPCR) was measured. We used longitudinal targeted minimum loss-based estimation (TMLE) and marginal structural models (MSM) to estimate the cumulative incidence of adverse kidney outcomes (eGFR halving and kidney failure requiring replacement therapy) of dynamic exposure to high proteinuria (UPCR≥1g/g) comparing counterfactuals (UPCR<1g/g) adjusted for time varying confounding (systolic blood pressure, estimated glomerular filtration rate, and renin-angiotensin-aldosterone blockers) and baseline covariates (age, sex, diabetes, cardiovascular disease, and smoking). Patients with sustained high proteinuria throughout 7-year follow-up had significant higher risk of renal events than those with counterfactuals (relative risk [RR], 3.120; 95% confidence interval, 2.150-4.528; P < 0.001). The RR were 0.168 (0.055-0.517; P = 0.002) and 0.613 (0.337-1.114; P = 0.108) for early and late proteinuria reduction comparing sustained high proteinuria, respectively. In MSM, hazard ratio (HR) of accumulative high proteinuria frequency were 1.612 (1.463-1.844; P <0.001) up to 3 years and 1.102 (0.877-1.387; P = 0.404) up to 7 years, respectively. In CKD patients, sustained higher proteinuria is the main risk factor during the entire observation period. However, considering dynamic exposure, proteinuria reduction from the beginning has a protective effect on renal progression, but the effect weakens as the observation period becomes longer.

2023-05-31 — Impact of macrolide treatment on long-term mortality in patients admitted to the ICU due to CAP: a targeted maximum likelihood estimation and survival analysis

Authors: L. F. Reyes, E. Garcia, Elsa D. Ibáñez-Prada, Cristian C Serrano-Mayorga, Yuli V Fuentes, Alejandro Rodríguez, G. Moreno, A. Bastidas, Josep Gómez, Angélica Gonzalez, C. Frei, L. Celi, I. Martín-Loeches, G. Waterer
Year: 2023
Publication Date: 2023-05-31
Venue: Critical Care
DOI: 10.1186/s13054-023-04466-x
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Introduction Patients with community-acquired pneumonia (CAP) admitted to the intensive care unit (ICU) have high mortality rates during the acute infection and up to ten years thereafter. Recommendations from international CAP guidelines include macrolide-based treatment. However, there is no data on the long-term outcomes of this recommendation. Therefore, we aimed to determine the impact of macrolide-based therapy on long-term mortality in this population. Methods Registered patients in the MIMIC-IV database 16 years or older and admitted to the ICU due to CAP were included. Multivariate analysis, targeted maximum likelihood estimation (TMLE) to simulate a randomised controlled trial, and survival analyses were conducted to test the effect of macrolide-based treatment on mortality six-month (6 m) and twelve-month (12 m) after hospital admission. A sensitivity analysis was performed excluding patients with Pseudomonas aeruginosa or MRSA pneumonia to control for Healthcare-Associated Pneumonia (HCAP). Results 3775 patients were included, and 1154 were treated with a macrolide-based treatment. The non-macrolide-based group had worse long-term clinical outcomes, represented by 6 m [31.5 (363/1154) vs 39.5 (1035/2621), p < 0.001] and 12 m mortality [39.0 (450/1154) vs 45.7 (1198/2621), p < 0.001]. The main risk factors associated with long-term mortality were Charlson comorbidity index, SAPS II, septic shock, and respiratory failure. Macrolide-based treatment reduced the risk of dying at 6 m [HR (95% CI) 0.69 (0.60, 0.78), p < 0.001] and 12 m [0.72 (0.64, 0.81), p < 0.001]. After TMLE, the protective effect continued with an additive effect estimate of − 0.069. Conclusion Macrolide-based treatment reduced the hazard risk of long-term mortality by almost one-third. This effect remains after simulating an RCT with TMLE and the sensitivity analysis for the HCAP classification.

2023-05-25 — Investigating a Paradox: Towards Better Understanding the Relationships Between Racial Group Membership, Stress, and Major Depressive Disorder.

Authors: John R Pamplin Ii, K. Rudolph, K. Keyes, E. Susser, L. Bates
Year: 2023
Publication Date: 2023-05-25
Venue: American Journal of Epidemiology
DOI: 10.1093/aje/kwad128
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Epidemiologic studies in the U.S. routinely report a lower or equal prevalence of major depressive disorder (MDD) for Black people relative to white people. Within racial groups, individuals with greater life-stressor exposure experience greater prevalence of MDD; however, between racial groups this pattern does not hold. Informed by theoretical and empirical literature seeking to explain this "Black-white depression paradox," we outline two proposed models for the relationships between racial group membership, life-stressor exposure, and MDD: an Effect Modification model and an Inconsistent Mediator model. Either model could explain the paradoxical within- and between-racial group patterns of life-stressor exposure and MDD. We empirically estimate associations under each of the proposed models using data from 26,960 self-identified Black and white participants of the National Epidemiologic Survey on Alcohol and Related Conditions - III. Under the Effect Modification model, we estimated relative risk effect modification using parametric regression with a cross-product term, and under the Inconsistent Mediation model, we estimated interventional direct and indirect effects using Targeted Minimum Loss-based Estimation. We found evidence of inconsistent mediation-i.e., direct and indirect effects operating in opposite directions-suggesting a need for greater consideration of explanations for racial patterns in MDD that operate independent of life-stressor exposure.

2023-05-25 — Implementing TMLE in the presence of a continuous outcome

Authors: Hanna A Frank, Mohammad Ehsanul Karim
Year: 2023
Publication Date: 2023-05-25
Venue: Research Methods in Medicine &amp; Health Sciences
DOI: 10.1177/26320843231176662
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
In a real-world observational data analysis setting, guessing the true model specification can be difficult for an analyst. Unfortunately, correct model specification is a core assumption for treatment effect estimation methods such as propensity score methods, G-computation, and regression techniques. Targeted maximum likelihood estimation (TMLE) is an alternative method that allows the use of data-adaptive and machine learning algorithms for model fitting. TMLE therefore does not require strict assumptions about the model specification but preserves the validity of the inference. Multiple studies have shown that TMLE outperforms other methods in certain real-world settings, making it a useful and potentially superior algorithm for causal inference. However, there is a lack of accessible resources for practitioners to understand the implementation. Hence the TMLE framework is the least-used method by practitioners in epidemiology literature. Recently a few accessible articles have been published, but they focus only on binary outcomes and demonstrations are done mainly with simulated data. This paper aims to fill the gap in the literature by providing a step-by-step TMLE implementation guide for a continuous outcome, using an openly accessible clinical dataset.

2023-05-19 — Leaflet information by the local government on mental health during the coronavirus disease 2019 pandemic: a cross-sectional study in a rural area in Japan

Authors: Ryu Fukase, M. Murakami, T. Ikeda
Year: 2023
Publication Date: 2023-05-19
Venue: Family Practice
DOI: 10.1093/fampra/cmad059
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract Background The coronavirus disease 2019 (COVID-19) pandemic and associated infodemic increased depression and anxiety. Proper information can help combat the infodemic and promotes mental health; however, rural residents have more difficulties in getting correct information than urban residents. Objective To examine whether the information on COVID-19 provided by the local government maintained the mental health of rural residents in Japan. Methods A self-administered questionnaire survey of Okura Village (northern district of Japan) residents aged ≥16 years was conducted in October 2021. The main outcomes, depressive symptoms, psychological distress, and anxiety were measured using the Center for Epidemiologic Studies Depression Scale, Kessler Psychological Distress Scale, and Generalized Anxiety Disorder scale 7-item. Exposure was defined as whether the resident read the leaflet on COVID-19 distributed by the local government. The targeted maximum likelihood estimation was used to analyse the effect of leaflet reading on the main outcomes. Results A total of 974 respondents were analysed. Reading the leaflet was significantly lower risk for depressive symptoms relative risk (95% confidence interval): 0.64 (0.43–0.95). Meanwhile, no clear effects of leaflet reading were observed on mental distress and anxiety. Conclusions In rural areas with local governments, analogue information may be effective to prevent depression.

2023-05-17 — Nonparametric estimation of the interventional disparity indirect effect among the exposed

Authors: H. Rytgaard, A. Møller, Thomas Gerds
Year: 2023
Publication Date: 2023-05-17
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation, tmle

Abstract:
In situations with non-manipulable exposures, interventions can be targeted to shift the distribution of intermediate variables between exposure groups to define interventional disparity indirect effects. In this work, we present a theoretical study of identification and nonparametric estimation of the interventional disparity indirect effect among the exposed. The targeted estimand is intended for applications examining the outcome risk among an exposed population for which the risk is expected to be reduced if the distribution of a mediating variable was changed by a (hypothetical) policy or health intervention that targets the exposed population specifically. We derive the nonparametric efficient influence function, study its double robustness properties and present a targeted minimum loss-based estimation (TMLE) procedure. All theoretical results and algorithms are provided for both uncensored and right-censored survival outcomes. With offset in the ongoing discussion of the interpretation of non-manipulable exposures, we discuss relevant interpretations of the estimand under different sets of assumptions of no unmeasured confounding and provide a comparison of our estimand to other related estimands within the framework of interventional (disparity) effects. Small-sample performance and double robustness properties of our estimation procedure are investigated and illustrated in a simulation study.

2023-05-06 — Predicting Seasonal Influenza Hospitalizations Using an Ensemble Super Learner: A Simulation Study

Authors: Jason R Gantenberg, K. McConeghy, C. Howe, Jon Steingrimsson, R. van Aalst, A. Chit, A. Zullo
Year: 2023
Publication Date: 2023-05-06
Venue: American Journal of Epidemiology
DOI: 10.1093/aje/kwad113
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract Accurate forecasts can inform response to outbreaks. Most efforts in influenza forecasting have focused on predicting influenza-like activity, with fewer on influenza-related hospitalizations. We conducted a simulation study to evaluate a super learner’s predictions of 3 seasonal measures of influenza hospitalizations in the United States: peak hospitalization rate, peak hospitalization week, and cumulative hospitalization rate. We trained an ensemble machine learning algorithm on 15,000 simulated hospitalization curves and generated weekly predictions. We compared the performance of the ensemble (weighted combination of predictions from multiple prediction algorithms), the best-performing individual prediction algorithm, and a naive prediction (median of a simulated outcome distribution). Ensemble predictions performed similarly to the naive predictions early in the season but consistently improved as the season progressed for all prediction targets. The best-performing prediction algorithm in each week typically had similar predictive accuracy compared with the ensemble, but the specific prediction algorithm selected varied by week. An ensemble super learner improved predictions of influenza-related hospitalizations, relative to a naive prediction. Future work should examine the super learner’s performance using additional empirical data on influenza-related predictors (e.g., influenza-like illness). The algorithm should also be tailored to produce prospective probabilistic forecasts of selected prediction targets.

2023-05-03 — Semiparametric discovery and estimation of interaction in mixed exposures using stochastic interventions

Authors: David McCoy, A. Hubbard, Mark van der Laan, Alejandro Schuler
Year: 2023
Publication Date: 2023-05-03
Venue: Journal of Causal Inference
DOI: 10.1515/jci-2024-0058
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Abstract Understanding the complex interactions among multiple environmental exposures is critical for assessing their combined impact on health outcomes. This study introduces InterXshift, a novel semiparametric method that provides a nonparametric definition of interaction and facilitates both the discovery and efficient estimation of interaction effects in mixed exposures. Leveraging stochastic shift interventions and ensemble machine learning, InterXshift identifies and quantifies interactions through a model-independent target parameter, estimated using targeted maximum likelihood estimation (TMLE) and cross-validation. The approach contrasts expected outcomes from joint interventions against those from individual exposures, enabling the detection of synergistic and antagonistic interactions. Validation through simulations and application to the National Institute of Environmental Health Sciences (NIEHS) Mixtures Workshop data demonstrate InterXshift’s efficacy in accurately identifying true interaction directions and consistently highlighting significant impacts. We apply our methodology to National Health and Nutrition Examination Survey (NHANES) data to understand the interaction effect (if any) of furan exposure on leukocyte telomere length. This method enhances the analysis of multi-exposure interactions within high-dimensional datasets, offering robust methodological improvements for elucidating complex exposure dynamics in environmental health research. Additionally, we provide an open-source implementation of InterXshift in the InterXshift R package, facilitating its adoption and application by the research community.

2023-05-02 — Machine learning-driven multifunctional peptide engineering for sustained ocular drug delivery

Authors: Henry T. Hsueh, Renee Ti Chou, U. Rai, Wathsala Liyanage, Y. Kim, Matthew B Appell, Jahnavi Pejavar, Kirby T Leo, Charlotte Davison, Patricia Kolodziejski, Ann Mozzer, HyeYoung Kwon, Maanasa Sista, Nicole M. Anders, Avelina Hemingway, S. Rompicharla, M. Edwards, I. Pitha, J. Hanes, M. P. Cummings, L. Ensign
Year: 2023
Publication Date: 2023-05-02
Venue: Nature Communications
DOI: 10.1038/s41467-023-38056-w
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Sustained drug delivery is critical for patient adherence to chronic disease treatments. Here the authors apply machine learning to engineer multifunctional peptides with high melanin binding, high cell-penetration, and low cytotoxicity, enhancing the duration and efficacy of peptide-drug conjugates for sustained ocular delivery. Sustained drug delivery strategies have many potential benefits for treating a range of diseases, particularly chronic diseases that require treatment for years. For many chronic ocular diseases, patient adherence to eye drop dosing regimens and the need for frequent intraocular injections are significant barriers to effective disease management. Here, we utilize peptide engineering to impart melanin binding properties to peptide-drug conjugates to act as a sustained-release depot in the eye. We develop a super learning-based methodology to engineer multifunctional peptides that efficiently enter cells, bind to melanin, and have low cytotoxicity. When the lead multifunctional peptide (HR97) is conjugated to brimonidine, an intraocular pressure lowering drug that is prescribed for three times per day topical dosing, intraocular pressure reduction is observed for up to 18 days after a single intracameral injection in rabbits. Further, the cumulative intraocular pressure lowering effect increases ~17-fold compared to free brimonidine injection. Engineered multifunctional peptide-drug conjugates are a promising approach for providing sustained therapeutic delivery in the eye and beyond.

2023-05-01 — Predicting Miscarriage and Stillbirth Using Weighted Ensemble Machine Learning [ID: 1338167]

Authors: Anagha Lokhande, A. Gimovsky, I. Sarkar
Year: 2023
Publication Date: 2023-05-01
Venue: Obstetrics &amp; Gynecology
DOI: 10.1097/01.AOG.0000930064.05901.88
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
INTRODUCTION: Prediction of miscarriage and stillbirth remains a clinical challenge. Prior efforts to use machine learning tools have not used an ensemble weighted machine learning approach, called “super learner,” which offers the opportunity to improve prediction performance by aggregating the outputs of constituent machine learning models and preferentially weighting the highest-performing models. METHODS: Data were obtained from hospital-wide electronic health records from a large academic institution. The sample comprised 13,337 patients who delivered between 2008 and 2019, 6,932 of whom experienced a miscarriage or stillbirth. Phecodes for ICD-9-CM and ICD-10-CM were used to define miscarriage and stillbirth and create comorbidity categories. The constituent models of the super learner were XGBoost, random forest, a regularized generalized linear model (both lasso and ridge regressions), and a support vector machine. The objective of this study was to develop a super learner algorithm to predict miscarriage and stillbirth. RESULTS: The super learner model predicted miscarriage and stillbirth classification with an area under the receiver operating characteristic curve of 0.94 and an accuracy of 92%. It used two models: random forest, weighted at 73%, and SVM, weighted at 27%. The most highly weighted predictors were amniotic cavity abnormalities, pelvic soft tissue abnormalities, and preeclampsia. CONCLUSION: A super learner performs comparably to other models in the literature. External validation is warranted. The promising results suggest that a super learner model can be used as a clinical decision support tool, supplementing clinical judgement in predicting miscarriage and stillbirth.

2023-04-25 — A Within-Group Approach to Ensemble Machine Learning Methods for Causal Inference in Multilevel Studies

Authors: Youmi Suk
Year: 2023
Publication Date: 2023-04-25
Venue: Journal of educational and behavioral statistics
DOI: 10.3102/10769986231162096
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Machine learning (ML) methods for causal inference have gained popularity due to their flexibility to predict the outcome model and the propensity score. In this article, we provide a within-group approach for ML-based causal inference methods in order to robustly estimate average treatment effects in multilevel studies when there is cluster-level unmeasured confounding. We focus on one particular ML-based causal inference method based on the targeted maximum likelihood estimation (TMLE) with an ensemble learner called SuperLearner. Through our simulation studies, we observe that training TMLE within groups of similar clusters helps remove bias from cluster-level unmeasured confounders. Also, using within-group propensity scores estimated from fixed effects logistic regression increases the robustness of the proposed within-group TMLE method. Even if the propensity scores are partially misspecified, the within-group TMLE still produces robust ATE estimates due to double robustness with flexible modeling, unlike parametric-based inverse propensity weighting methods. We demonstrate our proposed methods and conduct sensitivity analyses against the number of groups and individual-level unmeasured confounding to evaluate the effect of taking an eighth-grade algebra course on math achievement in the Early Childhood Longitudinal Study.

2023-04-21 — Impact of Teeth on Social Participation: Modified Treatment Policy Approach

Authors: U. Cooray, G. Tsakos, A. Heilmann, Richard G Watt, K. Takeuchi, Katsunori Kondo, K. Osaka, Jun Aida
Year: 2023
Publication Date: 2023-04-21
Venue: Journal of dentistry research
DOI: 10.1177/00220345231164106
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation, tmle

Abstract:
Social participation prevents social isolation and loneliness among older adults while having numerous positive effects on their health and well-being in rapidly aging societies. We aimed to estimate the effect of retaining more natural teeth on social participation among older adults in Japan. The analysis used longitudinal data from 24,872 participants in the Japan Gerontological Evaluation Study (2010, 2013, and 2016). We employed a longitudinal modified treatment policy approach to determine the effect of several hypothetical scenarios (preventive scenarios and tooth loss scenarios) on frequent social participation (1 = at least once a week/0 = less than once a week) after a 6-y follow-up. The corresponding statistical parameters were estimated using targeted minimum loss-based estimation (TMLE) method. Number of teeth category (edentate/1–9/10–19/≥20) was treated as a time-varying exposure, and the outcome estimates were adjusted for time-varying (income, self-rated health, marital status, instrumental activities of daily living, vision loss, hearing loss, major comorbidities, and number of household members) and time-invariant covariates (age, sex, education, baseline social participation). Less frequent social participation was associated with older age, male sex, lower income, low educational attainment, and poor self-rated health at the baseline. Social participation improved when tooth loss prevention scenarios were emulated. The best preventive scenario (i.e., maintaining ≥20 teeth among each participant) improved social participation by 8% (risk ratio [RR] = 1.08; 95% confidence interval [CI], 1.05–1.11). Emulated tooth loss scenarios gradually decreased social participation. A hypothetical scenario in which all the participants were edentate throughout the follow-up period resulted in a 11% (RR = 0.89; 95% CI, 0.84–0.94) reduction in social participation. Subsequent tooth loss scenarios showed 8% (RR = 0.92; 95% CI, 0.88–0.95), 6% (RR = 0.94; 95% CI, 0.91–0.97), and 4% (RR = 0.96; 95% CI, 0.93–0.98) reductions, respectively. Thus, among Japanese older adults, retaining a higher number of teeth positively affects their social participation, whereas being edentate or having a relatively lower number of teeth negatively affects their social participation.

2023-04-13 — MUSSEL: Enhanced Bayesian Polygenic Risk Prediction Leveraging Information across Multiple Ancestry Groups

Authors: Jin Jin, Jianan Zhan, Jingning Zhang, Ruzhang Zhao, Jared O’Connell, Yunxuan Jiang, S. Buyske, Christopher R. Gignoux, C. Haiman, E. Kenny, C. Kooperberg, Kari North, B. Koelsch, G. Wojcik, Haoyu Zhang, N. Chatterjee
Year: 2023
Publication Date: 2023-04-13
Venue: bioRxiv
DOI: 10.1101/2023.04.12.536510
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Polygenic risk scores (PRS) are now showing promising predictive performance on a wide variety of complex traits and diseases, but there exists a substantial performance gap across different populations. We propose MUSSEL, a method for ancestry-specific polygenic prediction that borrows information in the summary statistics from genome-wide association studies (GWAS) across multiple ancestry groups. MUSSEL conducts Bayesian hierarchical modeling under a MUltivariate Spike-and-Slab model for effect-size distribution and incorporates an Ensemble Learning step using super learner to combine information across different tuning parameter settings and ancestry groups. In our simulation studies and data analyses of 16 traits across four distinct studies, totaling 5.7 million participants with a substantial ancestral diversity, MUSSEL shows promising performance compared to alternatives. The method, for example, has an average gain in prediction R2 across 11 continuous traits of 40.2% and 49.3% compared to PRS-CSx and CT-SLEB, respectively, in the African Ancestry population. The best-performing method, however, varies by GWAS sample size, target ancestry, underlying trait architecture, and the choice of reference samples for LD estimation, and thus ultimately, a combination of methods may be needed to generate the most robust PRS across diverse populations.

2023-04-11 — Targeted maximum likelihood based estimation for longitudinal mediation analysis

Authors: Ze-Yu Wang, L. Laan, M. Petersen, Thomas Gerds, K. Kvist, M. Laan
Year: 2023
Publication Date: 2023-04-11
Venue: Journal of Causal Inference
DOI: 10.1515/jci-2023-0013
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso, targeted maximum likelihood estimation, tmle

Abstract:
Abstract Causal mediation analysis with random interventions has become an area of significant interest for understanding time-varying effects with longitudinal and survival outcomes. To tackle causal and statistical challenges due to the complex longitudinal data structure with time-varying confounders, competing risks, and informative censoring, there exists a general desire to combine machine learning techniques and semiparametric theory. In this article, we focus on targeted maximum likelihood estimation (TMLE) of longitudinal natural direct and indirect effects defined with random interventions. The proposed estimators are multiply robust, locally efficient, and directly estimate and update the conditional densities that factorize data likelihoods. We utilize the highly adaptive lasso (HAL) and projection representations to derive new estimators (HAL-EIC) of the efficient influence curves (EICs) of longitudinal mediation problems and propose a fast one-step TMLE algorithm using HAL-EIC while preserving the asymptotic properties. The proposed method can be generalized for other longitudinal causal parameters that are smooth functions of data likelihoods, and thereby provides a novel and flexible statistical toolbox.

2023-04-03 — Comparison of Propylthiouracil vs Methimazole for Thyroid Storm in Critically Ill Patients.

Authors: S. Y. Lee, Katherine L. Modzelewski, A. Law, A. Walkey, E. Pearce, N. Bosch
Year: 2023
Publication Date: 2023-04-03
Venue: JAMA Network Open
DOI: 10.1001/jamanetworkopen.2023.8655
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Importance Thyroid storm is the most severe form of thyrotoxicosis, with high mortality, and is treated with propylthiouracil and methimazole. Some guidelines recommend propylthiouracil over methimazole, although the difference in outcomes associated with each treatment is unclear. Objective To compare outcomes associated with use of propylthiouracil vs methimazole for the treatment of thyroid storm. Design, Setting, and Participants This comparative effectiveness study comprised a large, multicenter, US-based cohort from the Premier Healthcare Database between January 1, 2016, and December 31, 2020. It included 1383 adult patients admitted to intensive or intermediate care units with a diagnosis of thyroid storm per International Statistical Classification of Diseases and Related Health Problems, Tenth Revision codes and treated with either propylthiouracil or methimazole. Analyses were conducted from July 2022 to February 2023. Exposure Patients received either propylthiouracil or methimazole for treatment of thyroid storm. Exposure was assigned based on the initial thionamide administered. Main Outcomes and Measures The primary outcome was the adjusted risk difference of in-hospital death or discharge to hospice between patients treated with propylthiouracil and those treated with methimazole, assessed by targeted maximum likelihood estimation. Results A total of 1383 patients (656 [47.4%] treated with propylthiouracil; mean [SD] age, 45 [16] years; 473 women [72.1%]; and 727 [52.6%] treated with methimazole; mean [SD] age, 45 [16] years; 520 women [71.5%]) were included in the study. The standardized mean difference for age was 0.056, and the standardized mean difference for sex was 0.013. The primary composite outcome occurred in 7.4% of of patients (102 of 1383; 95% CI, 6.0%-8.8%). A total of 8.5% (56 of 656; 95% CI, 6.4%-10.7%) of patients who initiated propylthiouracil and 6.3% (46 of 727; 95% CI, 4.6%-8.1%) who initiated methimazole died in the hospital (adjusted risk difference, 0.6% [95% CI, -1.8% to 3.0%]; P = .64). There were no significant differences in duration of organ support, total hospitalization costs, or rates of adverse events between the 2 treatment groups. Conclusion and Relevance In this comparative effectiveness study of a multicenter cohort of adult patients with thyroid storm, no significant differences were found in mortality or adverse events in patients who were treated with propylthiouracil or methimazole. Thus, current guidelines recommending propylthiouracil over methimazole for treatment of thyroid storm may merit reevaluation.

2023-04-03 — Beyond the Cox Hazard Ratio: A Targeted Learning Approach to Survival Analysis in a Cardiovascular Outcome Trial Application

Authors: David Chen, M. Petersen, H. Rytgaard, Randi Grøn, T. Lange, S. Rasmussen, R. Pratley, S. Marso, K. Kvist, J. Buse, M. J. van der Laan
Year: 2023
Publication Date: 2023-04-03
Venue: Statistics in Biopharmaceutical Research
DOI: 10.1080/19466315.2023.2173644
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Abstract The Hazard Ratio (HR) is a well-established treatment effect measure in randomized trials involving right-censored time-to-events, and the Cardiovascular Outcome Trials (CVOTs) conducted since the FDA’s 2008 guidance have indeed largely evaluated excess risk by estimating a Cox HR. On the other hand, the limitations of the Cox model and of the HR as a causal estimand are well known, and the FDA’s updated 2020 CVOT guidance invites us to reassess this default approach to survival analyses. We highlight the shortcomings of Cox HR-based analyses and present an alternative following the causal roadmap—moving in a principled way from a counterfactual causal question to identifying a statistical estimand, and finally to targeted estimation in a large statistical model. We show in simulations the robustness of Targeted Maximum Likelihood Estimation (TMLE) to informative censoring and model misspecification and demonstrate a targeted learning analogue of the original Cox HR-based analysis of the Liraglutide Effect and Action in Diabetes: Evaluation of Cardiovascular Outcome Results (LEADER) trial. We discuss the potential reliability, interpretability, and efficiency gains to be had by updating our survival methods to incorporate the recent decades of advancements in formal causal frameworks and efficient nonparametricestimation.

2023-03-30 — Real-Time Causal Inference on Spark: Structured Streaming for Policy Lift

Authors: Murali Krishna Pasupuleti
Year: 2023
Publication Date: 2023-03-30
Venue: International Journal of Academic and Industrial Research Innovations(IJAIRI)
DOI: 10.62311/nesx/rp-30032023-53-63
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract: We design a real-time causal inference stack on Spark Structured Streaming to estimate policy lift with tight service-level objectives. Our pipeline performs online eligibility checks, randomization/switchback assignment, and stateful feature engineering, feeding doubly robust and TMLE estimators with CUPED variance reduction. Synthetic near-real traffic shows micro-batch and continuous modes reduce ATE bias and improve lift stability while meeting sub-second P95 latency at scale. We contribute a conceptual model, mathematical framing of bias/variance under watermarking and lateness, and a governance-ready evaluation protocol. Keywords: Spark Structured Streaming; Causal Inference; Policy Lift; Doubly Robust; TMLE; CUPED; Watermark; Micro-batch; SLA.

2023-03-28 — Disaggregating Latino nativity in equity research using electronic health records.

Authors: M. Marino, Katie Fankhauser, J. Minnier, J. Lucas, Sophia Giebultowicz, Jorge Kaufmann, Jun-Shik Hwang, S. Bailey, Danielle M. Crookes, A. Bazemore, S. Suglia, J. Heintzman
Year: 2023
Publication Date: 2023-03-28
Venue: Health Services Research
DOI: 10.1111/1475-6773.14154
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
OBJECTIVE To develop and validate prediction models for inference of Latino nativity to advance health equity research. DATA SOURCES/STUDY SETTING This study used electronic health records (EHRs) from 19,985 Latino children with self-reported country of birth seeking care from January 1, 2012 to December 31, 2018 at 456 community health centers (CHCs) across 15 states along with census-tract geocoded neighborhood composition and surname data. STUDY DESIGN We constructed and evaluated the performance of prediction models within a broad machine learning framework (Super Learner) for the estimation of Latino nativity. Outcomes included binary indicators denoting nativity (US vs. foreign-born) and Latino country of birth (Mexican, Cuban, Guatemalan). The performance of these models was compared using the area under the receiver operating characteristics curve (AUC) from an externally withheld patient sample. DATA COLLECTION/EXTRACTION METHODS Census surname lists, census neighborhood composition, and Forebears administrative data were linked to EHR data. PRINCIPAL FINDINGS Of the 19,985 Latino patients, 10.7% reported a non-US country of birth (5.1% Mexican, 4.7% Guatemalan, 0.8% Cuban). Overall, prediction models for nativity showed outstanding performance with external validation (US-born vs. foreign: AUC = 0.90; Mexican vs. non-Mexican: AUC = 0.89; Guatemalan vs. non-Guatemalan: AUC = 0.95; Cuban vs. non-Cuban: AUC = 0.99). CONCLUSIONS Among challenges facing health equity researchers in health services is the absence of methods for data disaggregation, and the specific ability to determine Latino country of birth (nativity) to inform disparities. Recent interest in more robust health equity research has called attention to the importance of data disaggregation. In a multistate network of CHCs using multilevel inputs from EHR data linked to surname and community data, we developed and validated novel prediction models for the use of available EHR data to infer Latino nativity for health disparities research in primary care and health services research, which is a significant potential methodologic advance in studying this population.

2023-03-28 — Adaptive Strategies for Retention in Care among Persons Living with HIV.

Authors: E. Geng, T. Odeny, L. Montoya, Sarah Iguna, Jayne L Kulzer, H. Adhiambo, I. Eshun-Wilson, E. Akama, Everlyne Nyandieka, M. Guzé, S. Shade, L. Packel, Branson Fox, C. Camlin, H. Thirumurthy, Catherine Lyons, E. Bukusi, M. Petersen
Year: 2023
Publication Date: 2023-03-28
Venue: NEJM Evidence
DOI: 10.1056/evidoa2200076
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
BACKGROUND Optimizing retention in human immunodeficiency virus (HIV) treatment may require sequential behavioral interventions based on patients' response. METHODS In a sequential multiple assignment randomized trial in Kenya, we randomly assigned adults initiating HIV treatment to standard of care (SOC), Short Message Service (SMS) messages, or conditional cash transfers (CCT). Those with retention lapse (missed a clinic visit by ≥14 days) were randomly assigned again to standard-of-care outreach (SOC-Outreach), SMS+CCT, or peer navigation. Those randomly assigned to SMS or CCT who did not lapse after 1 year were randomly assigned again to either stop or continue the initial intervention. Primary outcomes were retention in care without an initial lapse, return to the clinic among those who lapsed, and time in care; secondary outcomes included adjudicated viral suppression. Average treatment effect (ATE) was calculated using targeted maximum likelihood estimation with adjustment for baseline characteristics at randomization and certain time-varying characteristics at rerandomization. RESULTS Among 1809 participants, 79.7% of those randomly assigned to CCT (n=523/656), 71.7% to SMS (n=393/548), and 70.7% to SOC (n=428/605) were retained in care in the first year (ATE: 9.9%; 95% confidence interval [CI]: 5.4%, 14.4% and ATE: 4.2%; 95% CI: -0.7%, 9.2% for CCT and SMS compared with SOC, respectively). Among 312 participants with an initial lapse who were randomly assigned again, 69.1% who were randomly assigned to a navigator (n=76/110) returned, 69.5% randomly assigned to CCT+SMS (n=73/105) returned, and 55.7% randomly assigned to SOC-Outreach (n=54/97) returned (ATE: 14.1%; 95% CI: 0.6%, 27.6% and ATE: 11.4%; 95% CI: -2.2%, 24.9% for navigator and CCT+SMS compared with SOC-Outreach, respectively). Among participants without lapse on SMS, continuing SMS did not affect retention (n=122/180; 67.8% retained) versus stopping (n=151/209; 72.2% retained; ATE: -4.4%; 95% CI: -16.6%, 7.9%). Among participants without lapse on CCT, those continuing CCT had higher retention (n=192/230; 83.5% retained) than those stopping (n=173/287; 60.3% retained; ATE: 28.6%; 95% CI: 19.9%, 37.3%). Among 15 sequenced strategies, initial CCT, escalated to navigator if lapse occurred and continued if no lapse occurred, increased time in care (ATE: 7.2%, 95% CI: 3.7%, 10.7%) and viral suppression (ATE: 8.2%, 95% CI: 2.2%, 14.2%), the most compared with SOC throughout. Initial SMS escalated to navigator if lapse occurred, and otherwise continued, showed similar effect sizes compared with SOC throughout. CONCLUSIONS Active interventions to prevent retention lapses followed by navigation for those who lapse and maintenance of initial intervention for those without lapse resulted in best overall retention and viral suppression among the strategies studied. Among those who remained in care, discontinuation of CCT, but not SMS, compromised retention and suppression. (Funded by National Institutes of Health grants R01 MH104123, K24 AI134413, and R01 AI074345; ClinicalTrials.gov number, NCT02338739.).

2023-03-27 — Comparative Effectiveness of Fludrocortisone and Hydrocortisone vs Hydrocortisone Alone Among Patients With Septic Shock.

Authors: N. Bosch, B. Teja, A. Law, B. Pang, S. Jafarzadeh, A. Walkey
Year: 2023
Publication Date: 2023-03-27
Venue: JAMA Internal Medicine
DOI: 10.1001/jamainternmed.2023.0258
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Importance Patients with septic shock may benefit from the initiation of corticosteroids. However, the comparative effectiveness of the 2 most studied corticosteroid regimens (hydrocortisone with fludrocortisone vs hydrocortisone alone) is unclear. Objective To compare the effectiveness of adding fludrocortisone to hydrocortisone vs hydrocortisone alone among patients with septic shock using target trial emulation. Design, Setting, and Participants This retrospective cohort study from 2016 to 2020 used the enhanced claims-based Premier Healthcare Database, which included approximately 25% of US hospitalizations. Participants were adult patients hospitalized with septic shock and receiving norepinephrine who began hydrocortisone treatment. Data analysis was performed from May 2022 to December 2022. Exposure Addition of fludrocortisone on the same calendar day that hydrocortisone treatment was initiated vs use of hydrocortisone alone. Main Outcome and Measures Composite of hospital death or discharge to hospice. Adjusted risk differences were calculated using doubly robust targeted maximum likelihood estimation. Results Analyses included 88 275 patients, 2280 who began treatment with hydrocortisone-fludrocortisone (median [IQR] age, 64 [54-73] years; 1041 female; 1239 male) and 85 995 (median [IQR] age, 67 [57-76] years; 42 136 female; 43 859 male) who began treatment with hydrocortisone alone. The primary composite outcome of death in hospital or discharge to hospice occurred among 1076 (47.2%) patients treated with hydrocortisone-fludrocortisone vs 43 669 (50.8%) treated with hydrocortisone alone (adjusted absolute risk difference, -3.7%; 95% CI, -4.2% to -3.1%; P < .001). Conclusions and Relevance In this comparative effectiveness cohort study among adult patients with septic shock who began hydrocortisone treatment, the addition of fludrocortisone was superior to hydrocortisone alone.

2023-03-21 — Causal Inference in Business Decision-Making: Integrating Machine Learning with Econometric Models for Accurate Business Forecasts

Authors: Md Zikar Hossan, Taslima Sultana
Year: 2023
Publication Date: 2023-03-21
Venue: International Journal of Technology, Management and Humanities
DOI: 10.21590/2885fr16
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
In the contemporary world of increasingly data driven business, decision makers encounter a twofold issue of finding precise projections and gaining insights on how the cause and effect work. The tame econometric models offer a strong framework on which causal inference can be formulated but they are sometimes limited when it comes to dealing with complex high-dimensional data. Machine learning (ML) methods can be contrasted with extracting causality where a black box is transparent in causal interpretation but not in prediction and patterns. This article discusses how useful machine learning can be as an effort in synergizing it with econometric models to benefit business surveys in causal assumptions. Through the analysis of the critical methodological synergies (the application of causal forests, targeted maximum likelihood estimation (TMLE), and uplift modeling) the study proves that the integration of the strengths of predictive ability of machine learning and the inference strength of econometrics can provide more specific and useful information.The paper demonstrates the application of these hybrid methods based on the case studies in a wide range of the retail price, workforce productiveness, and marketing analytics. In the analysis, one does not merely see improvement in terms of forecast precision but also improvement in terms of supporting policy and investment decisions to be made with causality. The interpretation constraints, the selection problem, and ethical issues are also evaluated critically. The results give recommendation to the fact that an integrative model holds stronger and responsive business approaches which is opening up to evidence-based leadership in complicated circumstances in the market. Finally, the work has already entered the developing debate on interdisciplinary analytics, arguing that a reciprocating approach should rely on both predictive and explanatory research.

2023-03-20 — Antibiotic prescribing in remote versus face-to-face consultations for acute respiratory infections in English primary care: An observational study using TMLE

Authors: E. Vestesson, K. de Corte, P. Chappell, E. Crellin, G. Clarke
Year: 2023
Publication Date: 2023-03-20
Venue: medRxiv
DOI: 10.1101/2023.03.20.23287466
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Background The COVID-19 pandemic has led to an ongoing increase in the use of remote consultations in general practice in England. Though the evidence is limited, there are concerns that the increase in remote consultations could lead to more antibiotic prescribing. Methods We used patient-level primary care data from the Clinical Practice Research Datalink to estimate the association between consultation mode (remote vs face-to-face) and antibiotic prescribing in England for acute respiratory infections (ARI) between April 2021 - March 2022. We used targeted maximum likelihood estimation, a causal machine learning method with adjustment for patient-, clinician- and practice-level factors. Findings There were 45,997 ARI consultations (34,555 unique patients), of which 28,127 were remote and 17,870 face-to-face. For children, 48% of consultations were remote whereas for adults 66% were remote. For children, 42% of remote and 43% face-to-face consultations led to an antibiotic prescription; the equivalent in adults was 52% of remote and 42% face-to-face. Adults with a remote consultation had 23% (Odds Ratio (OR) 1.23 95% Confidence Interval (CI): 1.18-1.29) higher chance of being prescribed antibiotics compared to if they had been seen face-to-face. We found no significant association between consultation mode and antibiotic prescribing in children (OR 1.04 95% CI 0.98-1.11). Interpretation This study uses rich patient-level data and robust statistical methods and represents an important contribution to the evidence base on antibiotic prescribing in post-COVID primary care. The higher rates of antibiotic prescribing in remote consultations for adults are cause for concern. We see no significant difference in antibiotic prescribing between consultation mode for children. These findings should inform antimicrobial stewardship activities for health care professionals and policy makers. Future research should examine differences in guideline-compliance between remote and face-to-face consultations to understand the factors driving antibiotic prescribing in different consultation modes. Funding No external funding. Keywords general practice; England; antibiotics; remote consultations; telehealth; telemedicine; TMLE; causal inference; machine learning; acute respiratory infections; antimicrobial resistance; covid; ARTI; ARI; antibiotic prescribing; primary care

2023-03-13 — Application of targeted maximum likelihood estimation in public health and epidemiological studies: a systematic review.

Authors: Matthew J Smith, Rachael V. Phillips, M. Luque-Fernández, C. Maringe
Year: 2023
Publication Date: 2023-03-13
Venue: Annals of Epidemiology
DOI: 10.1016/j.annepidem.2023.06.004
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2023-03-04 — Covariate‐adjusted response‐adaptive designs based on semiparametric approaches

Authors: Hai Zhu, Hongjian Zhu
Year: 2023
Publication Date: 2023-03-04
Venue: Biometrics
DOI: 10.1111/biom.13849
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
We consider theoretical and practical issues for innovatively using a large number of covariates in clinical trials to achieve various design objectives without model misspecification. Specifically, we propose a new family of semiparametric covariate‐adjusted response‐adaptive randomization (CARA) designs and we use the target maximum likelihood estimation (TMLE) to analyze the correlated data from CARA designs. Our approach can flexibly achieve multiple objectives and correctly incorporate the effect of a large number of covariates on the responses without model misspecification. We also obtain the consistency and asymptotic normality of the target parameters, allocation probabilities, and allocation proportions. Numerical studies demonstrate that our approach has advantages over existing approaches, even when the data‐generating distribution is complicated.

2023-03-02 — Human Immunodeficiency Virus Status, Tenofovir Exposure, and the Risk of Poor Coronavirus Disease 19 (COVID-19) Outcomes: Real-World Analysis From 6 United States Cohorts Before Vaccine Rollout.

Authors: A. Lea, W. Leyden, O. Sofrygin, B. Marafino, J. Skarbinski, S. Napravnik, D. Agil, M. Augenbraun, L. Benning, M. Horberg, Celeena R. Jefferson, V. Marconi, L. Park, K. Gordon, L. Bastarache, Srushti Gangireddy, K. Althoff, S. Coburn, K. Gebo, R. Lang, C. Williams, M. Silverberg
Year: 2023
Publication Date: 2023-03-02
Venue: Clinical Infectious Diseases
DOI: 10.1093/cid/ciad084
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
BACKGROUND People with human immunodeficiency virus (HIV) (PWH) may be at increased risk for severe coronavirus disease 2019 (COVID-19) outcomes. We examined HIV status and COVID-19 severity, and whether tenofovir, used by PWH for HIV treatment and people without HIV (PWoH) for HIV prevention, was associated with protection. METHODS Within 6 cohorts of PWH and PWoH in the United States, we compared the 90-day risk of any hospitalization, COVID-19 hospitalization, and mechanical ventilation or death by HIV status and by prior exposure to tenofovir, among those with severe acute respiratory syndrome coronavirus 2 infection between 1 March and 30 November 2020. Adjusted risk ratios (aRRs) were estimated by targeted maximum likelihood estimation, with adjustment for demographics, cohort, smoking, body mass index, Charlson comorbidity index, calendar period of first infection, and CD4 cell counts and HIV RNA levels (in PWH only). RESULTS Among PWH (n = 1785), 15% were hospitalized for COVID-19 and 5% received mechanical ventilation or died, compared with 6% and 2%, respectively, for PWoH (n = 189 351). Outcome prevalence was lower for PWH and PWoH with prior tenofovir use. In adjusted analyses, PWH were at increased risk compared with PWoH for any hospitalization (aRR, 1.31 [95% confidence interval, 1.20-1.44]), COVID-19 hospitalizations (1.29 [1.15-1.45]), and mechanical ventilation or death (1.51 [1.19-1.92]). Prior tenofovir use was associated with reduced hospitalizations among PWH (aRR, 0.85 [95% confidence interval, .73-.99]) and PWoH (0.71 [.62-.81]). CONCLUSIONS Before COVID-19 vaccine availability, PWH were at greater risk for severe outcomes than PWoH. Tenofovir was associated with a significant reduction in clinical events for both PWH and PWoH.

2023-03-01 — What if we wait? Using synthetic waiting lists to estimate treatment effects in routine outcome data

Authors: T. Kaiser, E. Brakemeier, P. Herzog
Year: 2023
Publication Date: 2023-03-01
Venue: Psychotherapy Research
DOI: 10.1080/10503307.2023.2182241
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract Objective: Due to the lack of randomization, pre–post routine outcome data precludes causal conclusions. We propose the “synthetic waiting list” (SWL) control group to overcome this limitation. Method: First, a step-by-step introduction illustrates this novel approach. Then, this approach is demonstrated using an empirical example with data from an outpatient cognitive–behavioral therapy (CBT) clinic (N = 139). We trained an ensemble machine learning model (“Super Learner”) on a data set of patients waiting for treatment (N = 311) to make counterfactual predictions of symptom change during this hypothetical period. Results: The between-group treatment effect was estimated to be d = 0.42. Of the patients who received CBT, 43.88% achieved reliable and clinically significant change, while this probability was estimated to be 14.54% in the SWL group. Counterfactual estimates suggest a clear net benefit of psychotherapy for 41% of patients. In 32%, the benefit was unclear, and 27% would have improved similarly without receiving CBT. Conclusions: The SWL is a viable new approach that provides between-group outcome estimates similar to those reported in the literature comparing psychotherapy with high-intensity control interventions. It holds the potential to mitigate common limitations of routine outcome data analysis.

2023-03-01 — Longitudinal associations between specific types/amounts social contact and cognitive function among middle-aged and elderly Chinese: A causal inference and longitudinal targeted maximum likelihood estimation analysis.

Authors: Yemian Li, Yuhui Yang, Peng Zhao, Jingxian Wang, B. Mi, Yaling Zhao, L. Pei, Hong Yan, Fangyao Chen
Year: 2023
Publication Date: 2023-03-01
Venue: Journal of Affective Disorders
DOI: 10.1016/j.jad.2023.03.039
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2023-02-28 — Practice Patterns and Outcomes Associated With Anticoagulation Use Following Sepsis Hospitalizations With New-Onset Atrial Fibrillation

Authors: A. Walkey, L. C. Myers, K. Thai, P. Kipnis, M. Desai, A. Go, Yun Lu, H. Clancy, Y. Devis, R. Neugebauer, V. Liu
Year: 2023
Publication Date: 2023-02-28
Venue: Circulation. Cardiovascular Quality and Outcomes
DOI: 10.1161/CIRCOUTCOMES.122.009494
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Background: Practice patterns and outcomes associated with the use of oral anticoagulation for arterial thromboembolism prevention following a hospitalization with new-onset atrial fibrillation (AF) during sepsis are unclear. Methods: Retrospective, observational cohort study of patients ≥40 years of age discharged alive following hospitalization with new-onset AF during sepsis across 21 hospitals in the Kaiser Permanente Northern California health care delivery system, years 2011 to 2018. Primary outcomes were ischemic stroke/transient ischemic attack (TIA), with a safety outcome of major bleeding events, both within 1 year of discharge alive from sepsis hospitalization. Adjusted risk differences for outcomes between patients who did and did not receive oral anticoagulation within 30 days of discharge were estimated using marginal structural models fitted by inverse probability weighting using Super Learning within a target trial emulation framework. Results: Among 82 748 patients hospitalized with sepsis, 3992 (4.8%) had new-onset AF and survived to hospital discharge; mean age was 78±11 years, 53% were men, and 70% were White. Patients with new-onset AF during sepsis averaged 45±33% of telemetry monitoring entries with AF, and 27% had AF present on the day of hospital discharge. Within 1 year of hospital discharge, 89 (2.2%) patients experienced stroke/TIA, 225 (5.6%) had major bleeding, and 1011 (25%) died. Within 30 days of discharge, 807 (20%) patients filled oral anticoagulation prescriptions, which were associated with higher 1-year adjusted risks of ischemic stroke/TIA (5.69% versus 2.32%; risk difference, 3.37% [95% CI, 0.36–6.38]) and no significant difference in 1-year adjusted risks of major bleeding (6.51% versus 7.10%; risk difference, −0.59% [95% CI, −3.09 to 1.91]). Sensitivity analysis of ischemic stroke–only outcomes showed a risk difference of 0.15% (95% CI, −1.72 to 2.03). Conclusions: After hospitalization with new-onset AF during sepsis, oral anticoagulation use was uncommon and associated with potentially higher stroke/TIA risk. Further research to inform mechanisms of stroke and TIA and management of new-onset AF after sepsis is needed.

2023-02-28 — AI-Based Automated Lipomatous Tumor Segmentation in MR Images: Ensemble Solution to Heterogeneous Data

Authors: Chih-Chieh Liu, Y. Abdelhafez, S. P. Yap, Francesco Acquafredda, S. Schirò, A. L. Wong, Dani Sarohia, C. Bateni, M. Darrow, M. Guindani, Sonia Lee, Michelle Zhang, A. Moawad, Q. K. Ng, Layla Shere, K. Elsayes, R. Maroldi, T. Link, L. Nardo, J. Qi
Year: 2023
Publication Date: 2023-02-28
Venue: Journal of digital imaging
DOI: 10.1007/s10278-023-00785-1
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Deep learning (DL) has been proposed to automate image segmentation and provide accuracy, consistency, and efficiency. Accurate segmentation of lipomatous tumors (LTs) is critical for correct tumor radiomics analysis and localization. The major challenge of this task is data heterogeneity, including tumor morphological characteristics and multicenter scanning protocols. To mitigate the issue, we aimed to develop a DL-based Super Learner (SL) ensemble framework with different data correction and normalization methods. Pathologically proven LTs on pre-operative T1-weighted/proton-density MR images of 185 patients were manually segmented. The LTs were categorized by tumor locations as distal upper limb (DUL), distal lower limb (DLL), proximal upper limb (PUL), proximal lower limb (PLL), or Trunk (T) and grouped by 80%/9%/11% for training, validation and testing. Six configurations of correction/normalization were applied to data for fivefold-cross-validation trainings, resulting in 30 base learners (BLs). A SL was obtained from the BLs by optimizing SL weights. The performance was evaluated by dice-similarity-coefficient (DSC), sensitivity, specificity, and Hausdorff distance (HD95). For predictions of the BLs, the average DSC, sensitivity, and specificity from the testing data were 0.72 ±\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pm$$\end{document} 0.16, 0.73 ±\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pm$$\end{document} 0.168, and 0.99 ±\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pm$$\end{document} 0.012, respectively, while for SL predictions were 0.80 ±\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pm$$\end{document} 0.184, 0.78 ±\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pm$$\end{document} 0.193, and 1.00 ±\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pm$$\end{document} 0.010. The average HD95 of the BLs were 11.5 (DUL), 23.2 (DLL), 25.9 (PUL), 32.1 (PLL), and 47.9 (T) mm, whereas of SL were 1.7, 8.4, 15.9, 2.2, and 36.6 mm, respectively. The proposed method could improve the segmentation accuracy and mitigate the performance instability and data heterogeneity aiding the differential diagnosis of LTs in real clinical situations.

2023-02-24 — Association of statin use with outcomes of patients admitted with COVID-19: an analysis of electronic health records using superlearner

Authors: A. Rivera, Omar Al-Heeti, Lucia Petito, Matt Feinstein, C. Achenbach, Janna L. Williams, B. Taiwo
Year: 2023
Publication Date: 2023-02-24
Venue: BMC Infectious Diseases
DOI: 10.1186/s12879-023-08026-0
Link: Semantic Scholar
Matched Keywords: super learner, targeted maximum likelihood estimation

Abstract:
Importance Statin use prior to hospitalization for Coronavirus Disease 2019 (COVID-19) is hypothesized to improve inpatient outcomes including mortality, but prior findings from large observational studies have been inconsistent, due in part to confounding. Recent advances in statistics, including incorporation of machine learning techniques into augmented inverse probability weighting with targeted maximum likelihood estimation, address baseline covariate imbalance while maximizing statistical efficiency. Objective To estimate the association of antecedent statin use with progression to severe inpatient outcomes among patients admitted for COVD-19. Design, setting and participants We retrospectively analyzed electronic health records (EHR) from individuals ≥ 40-years-old who were admitted between March 2020 and September 2022 for ≥ 24 h and tested positive for SARS-CoV-2 infection in the 30 days before to 7 days after admission . Exposure Antecedent statin use—statin prescription ≥ 30 days prior to COVID-19 admission. Main outcome Composite end point of in-hospital death, intubation, and intensive care unit (ICU) admission. Results Of 15,524 eligible COVID-19 patients, 4412 (20%) were antecedent statin users. Compared with non-users, statin users were older (72.9 (SD: 12.6) versus 65.6 (SD: 14.5) years) and more likely to be male (54% vs. 51%), White (76% vs. 71%), and have ≥ 1 medical comorbidity (99% vs. 86%). Unadjusted analysis demonstrated that a lower proportion of antecedent users experienced the composite outcome (14.8% vs 19.3%), ICU admission (13.9% vs 18.3%), intubation (5.1% vs 8.3%) and inpatient deaths (4.4% vs 5.2%) compared with non-users. Risk differences adjusted for labs and demographics were estimated using augmented inverse probability weighting with targeted maximum likelihood estimation using Super Learner . Statin users still had lower rates of the composite outcome (adjusted risk difference: − 3.4%; 95% CI: − 4.6% to − 2.1%), ICU admissions (− 3.3%; − 4.5% to − 2.1%), and intubation (− 1.9%; − 2.8% to − 1.0%) but comparable inpatient deaths (0.6%; − 1.3% to 0.1%). Conclusions and relevance After controlling for confounding using doubly robust methods, antecedent statin use was associated with minimally lower risk of severe COVID-19-related outcomes, ICU admission and intubation, however, we were not able to corroborate a statin-associated mortality benefit. Question Is statin use prior to hospital admission for COVID-19 associated with reducing severe inpatient outcomes? Findings In this observational study using electronic health records from a multi-hospital health system in Chicago, we used robust statistical methods to account for confounding and found that adults 40 years or older who were prescribed statins prior to admission for COVID-19 had minimally lower rates of intubation and admission to the intensive care unit. However, inpatient mortality was comparable between statins users and non-users. Meaning Consistent with current COVID-19 treatment guidelines, we did not find evidence supporting the utilization of statins for clinically significant reduction in severe inpatient COVID-19 outcomes.

2023-02-21 — CVtreeMLE: Efficient Estimation of Mixed Exposures using Data Adaptive Decision Trees and Cross-Validated Targeted Maximum Likelihood Estimation in R

Authors: David McCoy, A. Hubbard, M. J. Laan
Year: 2023
Publication Date: 2023-02-21
Venue: Journal of Open Source Software
DOI: 10.21105/joss.04181
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Summary Statistical causal inference of mixed exposures has been limited by reliance on parametric models and, until recently, by researchers considering only one exposure at a time, usually estimated as a beta coefficient in a generalized linear regression model (GLM). This independent assessment of exposures poorly estimates the joint impact of a collection of the same exposures in a realistic exposure setting. Marginal methods for mixture variable selection such as ridge/lasso regression are biased by linear assumptions and the interactions modeled are chosen by the user. Clustering methods such as principal component regression lose both interpretability and valid inference. Newer mixture methods such as quantile g-computation (Keil et al., 2020) are biased by linear/additive assumptions. More flexible methods such as Bayesian kernel machine regression (BKMR)(Bobb et al., 2014) are sensitive to the choice of tuning parameters, are computationally taxing and lack an interpretable and robust summary statistic of dose-response relationships. No methods currently exist which finds the best flexible model to adjust for covariates while applying a non-parametric model that targets for interactions in a mixture and delivers valid inference for a target parameter. Non-parametric methods such as decision trees are a useful tool to evaluate combined exposures by finding partitions in the joint-exposure (mixture) space that best explain the variance in an outcome. However, current methods using decision trees to assess statistical inference for interactions are biased and are prone to overfitting by using the full data to both identify nodes in the tree and make statistical inference given these nodes. Other methods have used an independent test set to derive inference which does not use the full data. The CVtreeMLE R package provides researchers in (bio)statistics, epidemiology, and environmental health sciences with access to state-of-the-art statistical methodology for evaluating the causal effects of a data-adaptively determined mixed exposure using decision trees. Our target audience are those analysts who would normally use a potentially biased GLM based model for a mixed exposure. Instead, we hope to provide users with a non-parametric statistical machine where users simply specify the exposures, covariates and outcome, CVtreeMLE then determines if a best fitting decision tree exists and delivers interpretable results.

2023-02-15 — Discovery of critical thresholds in mixed exposures and estimation of policy intervention effects

Authors: David McCoy, A. Hubbard, Mark van der Laan, Alejandro Schuler
Year: 2023
Publication Date: 2023-02-15
Venue: Journal of Causal Inference
DOI: 10.1515/jci-2024-0056
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Abstract Regulations of chemical exposures often focus on individual substances, neglecting the amplified toxicity that can arise from multiple concurrent exposures. We propose a novel methodology to identify critical thresholds in multivariate exposure spaces and estimate the effects of policy interventions that limit exposures within these thresholds. Our approach employs a recursive partitioning algorithm integrated with targeted maximum likelihood estimation (TMLE) to discover regions in the exposure space where the expected outcome is minimized or maximized. To address potential overfitting bias from using the same data for threshold discovery and effect estimation, we utilize cross-validated TMLE (CV-TMLE), which ensures asymptotic unbiasedness and efficiency. Simulation studies demonstrate convergence to the optimal exposure region and accurate estimation of intervention effects. We apply our method to synthetic mixture data, successfully identifying true interactions, and to NHANES data, discovering harmful metal exposures affecting telomere length. Our approach provides a flexible and interpretable framework for policy-makers to assess the impact of exposure regulations, and we offer an open-source implementation in the CVtreeMLE R package.

2023-02-03 — VR-LENS: Super Learning-based Cybersickness Detection and Explainable AI-Guided Deployment in Virtual Reality

Authors: Ripan Kumar Kundu, Osama Yahia Elsaid, P. Calyam, Khaza Anuarul Hoque
Year: 2023
Publication Date: 2023-02-03
Venue: International Conference on Intelligent User Interfaces
DOI: 10.1145/3581641.3584044
Link: Semantic Scholar
Matched Keywords: super learner, super learning

Abstract:
Virtual reality (VR) systems are known for their susceptibility to cybersickness, which can seriously hinder users’ experience. Therefore, a plethora of recent research has proposed several automated methods based on machine learning (ML) and deep learning (DL) to detect cybersickness. However, these detection methods are perceived as computationally intensive and black-box methods. Thus, those techniques are neither trustworthy nor practical for deploying on standalone VR head-mounted displays (HMDs). This work presents an explainable artificial intelligence (XAI)-based framework VR-LENS for developing cybersickness detection ML models, explaining them, reducing their size, and deploying them in a Qualcomm Snapdragon 750G processor-based Samsung A52 device. Specifically, we first develop a novel super learning-based ensemble ML model for cybersickness detection. Next, we employ a post-hoc explanation method, such as SHapley Additive exPlanations (SHAP), Morris Sensitivity Analysis (MSA), Local Interpretable Model-Agnostic Explanations (LIME), and Partial Dependence Plot (PDP) to explain the expected results and identify the most dominant features. The super learner cybersickness model is then retrained using the identified dominant features. Our proposed method identified eye tracking, player position, and galvanic skin/heart rate response as the most dominant features for the integrated sensor, gameplay, and bio-physiological datasets. We also show that the proposed XAI-guided feature reduction significantly reduces the model training and inference time by 1.91X and 2.15X while maintaining baseline accuracy. For instance, using the integrated sensor dataset, our reduced super learner model outperforms the state-of-the-art works by classifying cybersickness into 4 classes (none, low, medium, and high) with an accuracy of and regressing (FMS 1–10) with a Root Mean Square Error (RMSE) of 0.03. Our proposed method can help researchers analyze, detect, and mitigate cybersickness in real time and deploy the super learner-based cybersickness detection model in standalone VR headsets.

2023-02-03 — Time-varying exposure analysis of the relationship between sustained natural dentition and cognitive decline.

Authors: Y. Matsuyama
Year: 2023
Publication Date: 2023-02-03
Venue: Journal of Clinical Periodontology
DOI: 10.1111/jcpe.13786
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
AIM Tooth loss and cognitive decline progress over time and influence each other. This study estimated the impact of sustaining natural dentition on cognitive function in US adults, accounting for the fact that dental and cognitive statuses change over time. MATERIALS AND METHODS Data from adults aged ≥51 years who participated in five waves of the Health and Retirement Study from 2004 to 2016 (N = 10,953) were analyzed. The impact of retaining some natural teeth from 2006 to 2012 on cognitive function score (0-27) and cognitive impairment (defined as having a cognitive function score of <12) in 2016 was evaluated using the doubly robust targeted maximum likelihood estimation method by considering both time-invariant and time-varying confounders, including cognitive function at baseline and during follow-up. RESULTS Respondents with some natural teeth between 2006 and 2012 had a 0.40 point (95% confidence interval [CI]: 0.10, 0.71) higher cognitive function score and 3.27 percentage-point (95% CI: 0.11, 6.66) lower cognitive impairment prevalence in 2016 than those with complete tooth loss. CONCLUSION Considering past cognitive function assessed at multiple time points, sustained natural dentition was associated with better cognitive function. This article is protected by copyright. All rights reserved.

2023-02-01 — Cervical pessary for preterm birth prevention after an episode of arrested preterm labor: a retrospective cohort study with targeted maximum likelihood estimation of the average treatment effect.

Authors: G. Delli Carpini, Luca Giannella, M. Carboni, M. Fichera, D. Pizzagalli, N. Segnalini, C. Conti, E. Tafuri, L. Giuliani, F. Ragno, C. Mancusi, S. Giannubilo, A. Ciavattini
Year: 2023
Publication Date: 2023-02-01
Venue: European Review for Medical and Pharmacological Sciences
DOI: 10.26355/eurrev_202302_31202
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2023-02-01 — Biohydrogen from food waste: Modeling and estimation by machine learning based super learner approach

Authors: N. Sultana, S. Hossain, Sumayh S. Aljameel, M.E. Omran, S. Razzak, B. Haq, M. M. Hossain
Year: 2023
Publication Date: 2023-02-01
Venue: International journal of hydrogen energy
DOI: 10.1016/j.ijhydene.2023.01.339
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2023-01-31 — Higher Order Spline Highly Adaptive Lasso Estimators of Functional Parameters: Pointwise Asymptotic Normality and Uniform Convergence Rates

Authors: M. Laan
Year: 2023
Publication Date: 2023-01-31
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso

Abstract:
We consider estimation of a functional of the data distribution based on i.i.d. observations. We assume the target function can be defined as the minimizer of the expectation of a loss function over a class of $d$-variate real valued cadlag functions that have finite sectional variation norm. For all $k=0,1,\ldots$, we define a $k$-th order smoothness class of functions as $d$-variate functions on the unit cube for which each of a sequentially defined $k$-th order Radon-Nikodym derivative w.r.t. Lebesgue measure is cadlag and of bounded variation. For a target function in this $k$-th order smoothness class we provide a representation of the target function as an infinite linear combination of tensor products of $\leq k$-th order spline basis functions indexed by a knot-point, where the lower (than $k$) order spline basis functions are used to represent the function at the $0$-edges. The $L_1$-norm of the coefficients represents the sum of the variation norms across all the $k$-th order derivatives, which is called the $k$-th order sectional variation norm of the target function. This generalizes the zero order spline representation of cadlag functions with bounded sectional variation norm to higher order smoothness classes. We use this $k$-th order spline representation of a function to define the $k$-th order spline sieve minimum loss estimator (MLE), Highly Adaptive Lasso (HAL) MLE, and Relax HAL-MLE. For first and higher order smoothness classes, in this article we analyze these three classes of estimators and establish pointwise asymptotic normality and uniform convergence at dimension free rate $n^{-k^*/(2k^*+1)}$ up till a power of $\log n$ depending on the dimension, where $k^*=k+1$, assuming appropriate undersmoothing is used in selecting the $L_1$-norm. We also establish asymptotic linearity of plug-in estimators of pathwise differentiable features of the target function.

2023-01-31 — CHURN ANALISIS PADA DATA PELANGGAN TELEKOMUNIKASI MENGGUNAKAN ENSEMBLE LEARNING

Authors: Muthia Nadhira Faladiba, Rizqi Haryastuti
Year: 2023
Publication Date: 2023-01-31
Venue: STATMAT : JURNAL STATISTIKA DAN MATEMATIKA
DOI: 10.32493/sm.v5i1.31934
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Intense competition in broadband services will create high opportunities for consumers to switch providers, such as conditions that arise in competition for SMS, telephone, and internet services. The churn rate is the percentage of consumers who stop subscribing to the service. Ideally, this churn percentage is only 5% – 10%, and if it exceeds this figure, it indicates the company's inability to retain customers. A high churn rate indicates a decline in the cellular operator's market share and affects the company's revenue. Based on these problems, it is necessary to analyze the churn behavior of broadband subscribers to determine the dissatisfaction factors of cellular telecommunications consumers. Then predictions are made for customers who tend to churn from provider companies and determine the characteristics of churn and stay customers. The ensemble method is used to detect churn, which consists of several methods, including random forest, boosting, and super learner. Random Forest is proven to produce the best classification method with an excellent ability to predict customer churn, which is 80.1%, with an average usage time of 3 years.

2023-01-27 — Multi-task Highly Adaptive Lasso

Authors: I. Malenica, Rachael V. Phillips, D. Lazzareschi, Jeremy Coyle, R. Pirracchio, M. J. Laan
Year: 2023
Publication Date: 2023-01-27
Venue: arXiv.org
DOI: 10.48550/arXiv.2301.12029
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso

Abstract:
We propose a novel, fully nonparametric approach for the multi-task learning, the Multi-task Highly Adaptive Lasso (MT-HAL). MT-HAL simultaneously learns features, samples and task associations important for the common model, while imposing a shared sparse structure among similar tasks. Given multiple tasks, our approach automatically finds a sparse sharing structure. The proposed MTL algorithm attains a powerful dimension-free convergence rate of $o_p(n^{-1/4})$ or better. We show that MT-HAL outperforms sparsity-based MTL competitors across a wide range of simulation studies, including settings with nonlinear and linear relationships, varying levels of sparsity and task correlations, and different numbers of covariates and sample size.

2023-01-25 — Inference in Marginal Structural Models by Automatic Targeted Bayesian and Minimum Loss-Based Estimation

Authors: Herbert Susmann, A. Chambaz
Year: 2023
Publication Date: 2023-01-25
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Two of the principle tasks of causal inference are to define and estimate the effect of a treatment on an outcome of interest. Formally, such treatment effects are defined as a possibly functional summary of the data generating distribution, and are referred to as target parameters. Estimation of the target parameter can be difficult, especially when it is high-dimensional. Marginal Structural Models (MSMs) provide a way to summarize such target parameters in terms of a lower dimensional working model. We introduce the semi-parametric efficiency bound for estimating MSM parameters in a general setting. We then present a frequentist estimator that achieves this bound based on Targeted Minimum Loss-Based Estimation. Our results are derived in a general context, and can be easily adapted to specific data structures and target parameters. We then describe a novel targeted Bayesian estimator and provide a Bernstein von-Mises type result analyzing its asymptotic behavior. We propose a universal algorithm that uses automatic differentiation to put the estimator into practice for arbitrary choice of working model. The frequentist and Bayesian estimators have been implemented in the Julia software package TargetedMSM.jl. Finally, we illustrate our proposed methods by investigating the effect of interventions on family planning behavior using data from a randomized field experiment conducted in Malawi.

2023-01-17 — The Targeted Maximum Likelihood estimation to estimate the causal effects of the previous tuberculosis treatment in Multidrug-resistant tuberculosis in Sudan

Authors: A. Elduma, K. Holakouie-Naieni, A. Almasi-Hashiani, A. Rahimi Foroushani, Hamdan Mustafa Hamdan Ali, M. A. Adam, A. Elsony, Mohammad Ali Mansournia
Year: 2023
Publication Date: 2023-01-17
Venue: PLoS ONE
DOI: 10.1371/journal.pone.0279976
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Introduction This study used Targeted Maximum Likelihood Estimation (TMLE) as a double robust method to estimate the causal effect of previous tuberculosis treatment history on the occurrence of multidrug-resistant tuberculosis (MDR-TB). TMLE is a method to estimate the marginal statistical parameters in case-control study design. The aim of this study was to estimate the causal effect of the previous tuberculosis treatment on the occurrence of MDR-TB using TMLE in Sudan. Method A case-control study design combined with TMLE was used to estimate parameters. Cases were MDR-TB patients and controls were and patients who cured from tuberculosis. The history of previous TB treatment was considered the main exposure, and MDR-TB as an outcome. A designed questionnaire was used to collect a set of covariates including age, time to reach a health facility, number of times stopping treatment, gender, education level, and contact with MDR-TB cases. TMLE method was used to estimate the causal association of parameters. Statistical analysis was carried out with ltmle package in R-software. Result presented in graph and tables. Results A total number of 430 cases and 860 controls were included in this study. The estimated risk difference of the previous tuberculosis treatment was (0.189, 95% CI; 0.161, 0.218) with SE 0.014, and p-value (<0.001). In addition, the estimated risk ratio was (16.1, 95% CI; 12.932, 20.001) with SE = 0.014 and p-value (<0.001). Conclusion Our findings indicated that previous tuberculosis treatment history was determine as a risk factor for MDR-TB in Sudan. Also, TMLE method can be used to estimate the risk difference and the risk ratio in a case-control study design.

2023-01-17 — Comparing g-computation, propensity score-based weighting, and targeted maximum likelihood estimation for analyzing externally controlled trials with both measured and unmeasured confounders: a simulation study

Authors: Jinma Ren, P. Cislo, J. Cappelleri, P. Hlavacek, M. Dibonaventura
Year: 2023
Publication Date: 2023-01-17
Venue: BMC Medical Research Methodology
DOI: 10.1186/s12874-023-01835-6
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Objectives To have confidence in one's interpretation of treatment effects assessed by comparing trial results to external controls, minimizing bias is a critical step. We sought to investigate different methods for causal inference in simulated data sets with measured and unmeasured confounders. Methods The simulated data included three types of outcomes (continuous, binary, and time-to-event), treatment assignment, two measured baseline confounders, and one unmeasured confounding factor. Three scenarios were set to create different intensities of confounding effect (e.g., small and blocked confounding paths, medium and blocked confounding paths, and one large unblocked confounding path for scenario 1 to 3, respectively) caused by the unmeasured confounder. The methods of g-computation (GC), inverse probability of treatment weighting (IPTW), overlap weighting (OW), standardized mortality/morbidity ratio (SMR), and targeted maximum likelihood estimation (TMLE) were used to estimate average treatment effects and reduce potential biases. Results The results with the greatest extent of biases were from the raw model that ignored all the potential confounders. In scenario 2, the unmeasured factor indirectly influenced the treatment assignment through a measured controlling factor and led to medium confounding. The methods of GC, IPTW, OW, SMR, and TMLE removed most of bias observed in average treatment effects for all three types of outcomes from the raw model. Similar results were found in scenario 1, but the results tended to be biased in scenario 3. GC had the best performance followed by OW. Conclusions The aforesaid methods can be used for causal inference in externally controlled studies when there is no large, unblockable confounding path for an unmeasured confounder. GC and OW are the preferable approaches.

2023-01-16 — Personalized online ensemble machine learning with applications for dynamic data streams

Authors: I. Malenica, Rachael V. Phillips, A. Chambaz, A. Hubbard, R. Pirracchio, M. J. van der Laan
Year: 2023
Publication Date: 2023-01-16
Venue: Statistics in Medicine
DOI: 10.1002/sim.9655
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
In this work we introduce the personalized online super learner (POSL), an online personalizable ensemble machine learning algorithm for streaming data. POSL optimizes predictions with respect to baseline covariates, so personalization can vary from completely individualized, that is, optimization with respect to subject ID, to many individuals, that is, optimization with respect to common baseline covariates. As an online algorithm, POSL learns in real time. As a super learner, POSL is grounded in statistical optimality theory and can leverage a diversity of candidate algorithms, including online algorithms with different training and update times, fixed/offline algorithms that are not updated during POSL's fitting procedure, pooled algorithms that learn from many individuals' time series, and individualized algorithms that learn from within a single time series. POSL's ensembling of the candidates can depend on the amount of data collected, the stationarity of the time series, and the mutual characteristics of a group of time series. Depending on the underlying data‐generating process and the information available in the data, POSL is able to adapt to learning across samples, through time, or both. For a range of simulations that reflect realistic forecasting scenarios and in a medical application, we examine the performance of POSL relative to other current ensembling and online learning methods. We show that POSL is able to provide reliable predictions for both short and long time series, and it's able to adjust to changing data‐generating environments. We further cultivate POSL's practicality by extending it to settings where time series dynamically enter and exit.

2023-01-09 — The impact of disease changes and mental health illness on readapted return to work after repeated sick leaves among Brazilian public university employees

Authors: Adriano Dias, H. R. C. Nunes, C. Ruiz-Frutos, Juan Gómez-Salgado, Melissa Spröesser Alonso, João Marcos Bernardes, J. García-Iglesias, J. R. Lacalle-Remigio
Year: 2023
Publication Date: 2023-01-09
Venue: Frontiers in Public Health
DOI: 10.3389/fpubh.2022.1026053
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Introduction Health affects work absenteeism and productivity of workers, making it a relevant marker of an individual's professional development. Objectives The aims of this article were to investigate whether changes in the main cause of the sick leaves and the presence of mental health illnesses are associated with return to work with readaptation. Materials and methods A historical cohort study was carried out with non-work-related illnesses suffered by statutory workers of university campuses in a medium-sized city in the state of São Paulo, Brazil. Two exposures were measured: (a) changes, throughout medical examinations, in the International Classification of Diseases (ICD-10) chapter regarding the main condition for the sick leave; and (b) having at least one episode of sick leave due to mental illness, with or without change in the ICD-10 chapter over the follow-up period. The outcome was defined as return to work with adapted conditions. The causal model was established a priori and tested using a multiple logistic regression (MLR) model considering the effects of several confounding factors, and then compared with the same estimators obtained using Targeted Machine Learning. Results Among workers in adapted conditions, 64% were health professionals, 34% had had changes in the ICD-10 chapter throughout the series of sick leaves, and 62% had diagnoses of mental health issues. In addition, they worked for less time at the university and were absent for longer periods. Having had a change in the illness condition reduced the chance of returning to work in another function by more than 30%, whereas having had at least one absence because of a cause related to mental and behavioral disorders more than doubled the chance of not returning to work in the same activity as before. Conclusion These results were independent of the analysis technique used, which allows concluding that there were no advantages in the use of targeted maximum likelihood estimation (TMLE), given its difficulties in access, use, and assumptions.

2023-01-07 — Development and evaluation of a risk algorithm predicting alcohol dependence after early onset of regular alcohol use.

Authors: C. Bharat, M. Glantz, S. Aguilar-Gaxiola, J. Alonso, R. Bruffaerts, B. Bunting, J. Caldas-de-Almeida, G. Cardoso, Stephanie Chardoul, P. de Jonge, O. Gureje, J. Haro, M. Harris, E. Karam, N. Kawakami, A. Kiejna, V. Kovess-Masfety, Sing Lee, J. Mcgrath, J. Moskalewicz, F. Navarro-Mateu, C. Rapsey, N. Sampson, K. Scott, H. Tachimori, M. ten Have, G. Vilagut, B. Wojtyniak, M. Xavier, R. Kessler, L. Degenhardt
Year: 2023
Publication Date: 2023-01-07
Venue: Addiction
DOI: 10.1111/add.16122
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
AIM Likelihood of alcohol dependence (AD) is increased among people who transition to greater levels of alcohol involvement at a younger age. Indicated interventions delivered early may be effective in reducing risk but could be costly. One way to increase cost-effectiveness would be to develop a prediction model that targeted interventions to the subset of youth with early alcohol use who are at highest risk of subsequent AD. DESIGN A prediction model was developed for DSM-IV AD onset by age 25 using an ensemble machine learning algorithm known as super learner. Shapley additive explanations (SHAP) assessed variable importance. SETTING AND PARTICIPANTS Respondents reporting early onset of regular alcohol use (i.e., by 17 years of age) who were aged 25 years or older at interview from 14 representative community surveys conducted in 13 countries as part of WHO's World Mental Health Surveys. MEASUREMENTS The primary outcome to be predicted was onset of lifetime DSM-IV AD by age 25 as measured using the Composite International Diagnostic Interview, a fully structured diagnostic interview FINDINGS: AD prevalence by age 25 was 5.1% across the 10,687 individuals who reported drinking alcohol regularly by age 17. The prediction model achieved an external area under the curve (0.78; 95% confidence interval [CI] 0.74-0.81) higher than any individual candidate risk model (0.73-0.77) and an area under the precision-recall curve of 0.22. Overall calibration was good (ICI, 1.05%), however, miscalibration was observed at the extreme ends of the distribution of predicted probabilities. Interventions provided to the 20% of people with highest risk would identify 49% of AD cases and require treating four people without AD to reach one with AD. Important predictors of increased risk included younger onset of alcohol use, males, higher cohort alcohol use and more mental disorders. CONCLUSION A risk algorithm can be created using data collected at the onset of regular alcohol use to target youth at highest risk of alcohol dependence by early adulthood. Important considerations remain for advancing the development and practical implementation of such models.

2023-01-06 — PNL y superaprendizaje: Herramientas fortalecedoras en educación global en el s. XXI

Authors: Maricarmen Coromoto Soto Ortigoza, Lisandro Labrador
Year: 2023
Publication Date: 2023-01-06
Venue: REVISTA CIENTIFICA SAPERES UNIVERSITAS
DOI: 10.53485/rsu.v6i1.336
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Globality, generation of knowledge, innovation in education, leads to studying university organizations with traditional structures towards innovative models. Successful experiences have been observed in other latitudes based on Suggestive Pedagogy, which can be transferred to Latin America, such as the US, Asia, Europe with technological platforms, such as Bobbi's Quantum Learning, Super Camp, all in this current Created by Luzanov. Therefore, the general objective is to Analyze NLP and Super Learning as strengthening tools in global education from the perspective of neurosciences. Framed in a complementary paradigm, the methodology was based on an exploratory, descriptive, non-experimental field study, phenomenological in the community work of higher education students, based on the postulates of Ausubel, UNESCO. Bandler and Grinder. Among the results, the strategies for acquiring knowledge in an accelerated way and the definition of thought processes derived from transpersonal learning, with Super Learning, are observed. Likewise, an action plan based on NLP for the transformation of thoughts and emotions would support the execution of the teacher's role from the Syllabus with the use of gamifying, dynamic and inclusive materials provided by NLP. The conclusions consolidate these tools based on NLP for meaningful learning, that the student body is formed as a social servant, developing a pleasant quality work and above all with ethics, in the communities. Create less competitive elements and achieve true didactic strategies that accelerate learning in higher education entities linking University-Society, with the support of technology, motivating it in these difficult times, to an expansion and rigorous training of educators.

2023 — Super learner approach to predict total organic carbon using stacking machine learning models based on well logs

Authors: L. Goliatt, C. Saporetti, E. Pereira
Year: 2023
Venue: Fuel
DOI: 10.1016/j.fuel.2023.128682
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2023 — Prediction of Health Status of Battery Using Super Learner Algorithm

Authors: Sureshpandi P, Dr. S. Sharanya
Year: 2023
Venue: Recent Trends in Data Science and its Applications
DOI: 10.13052/rp-9788770040723.156
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2023 — Modeling of microbial fuel cell power generation using machine learning-based super learner algorithms

Authors: S. Zakir Hossain, N. Sultana, Shaker Haji, Shaikha Talal Mufeez, Sara Esam Janahi, Noof Adel Ahmed
Year: 2023
Venue: Fuel
DOI: 10.1016/j.fuel.2023.128646
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2023 — Modeling of capacitance for carbon-based supercapacitors using Super Learner algorithm

Authors: J. Abdi, T. Pirhoushyaran, Fahimeh Hadavimoghaddam, Seyed Ali Madani, Abdolhossein Hemmati-Sarapardeh, S. H. Esmaeili-Faraj
Year: 2023
Venue: Journal of Energy Storage
DOI: 10.1016/j.est.2023.107376
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2023 — Efficient thyroid disorder identification with weighted voting ensemble of super learners by using adaptive synthetic sampling technique

Authors: Noor Afshan, Zohaib Mushtaq, Faten S. Alamri, Muhammad Farrukh Qureshi, N. Khan, I. Siddique
Year: 2023
Venue: AIMS Mathematics
DOI: 10.3934/math.20231238
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
There are millions of people suffering from thyroid disease all over the world. For thyroid cancer to be effectively treated and managed, a correct diagnosis is necessary. In this article, we suggest an innovative approach for diagnosing thyroid disease that combines an adaptive synthetic sampling method with weighted average voting (WAV) ensemble of two distinct super learners (SLs). Resampling techniques are used in the suggested methodology to correct the class imbalance in the datasets and a group of two SLs made up of various base estimators and meta-estimators is used to increase the accuracy of thyroid cancer identification. To assess the effectiveness of our suggested methodology, we used two publicly accessible datasets: the KEEL thyroid illness (Dataset1) and the hypothyroid dataset (Dataset2) from the UCI repository. The findings of using the adaptive synthetic (ADASYN) sampling technique in both datasets revealed considerable gains in accuracy, precision, recall and F1-score. The WAV ensemble of the two distinct SLs that were deployed exhibited improved performance when compared to prior existing studies on identical datasets and produced higher prediction accuracy than any individual model alone. The suggested methodology has the potential to increase the accuracy of thyroid cancer categorization and could assist with patient diagnosis and treatment. The WAV ensemble strategy computational complexity and the ideal choice of base estimators in SLs continue to be constraints of this study that call for further investigation.

2022 (96 papers)
2022-12-30 — Data-driven photometric redshift estimation from type Ia supernovae light curves

Authors: Felipe M F de Oliveira, M. V. Santos, R. Reis
Year: 2022
Publication Date: 2022-12-30
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Redshift measurement has always been a constant need in modern astronomy and cosmology. And as new surveys have been providing an immense amount of data on astronomical objects, the need to process such data automatically proves to be increasingly necessary. In this article, we use simulated data from the Dark Energy Survey, and from a pipeline originally created to classify supernovae, we developed a linear regression algorithm optimized through novel automated machine learning (AutoML) frameworks achieving an error score better than ordinary data pre-processing methods when compared with other modern algorithms (such as XGBOOST). Numerically, the photometric prediction RMSE of type Ia supernovae events was reduced from 0.16 to 0.09 and the RMSE of all supernovae types decreased from 0.20 to 0.14. Our pipeline consists of four steps: through spectroscopic data points we interpolate the light curve using Gaussian process fitting algorithm, then using a wavelet transform we extract the most important features of such curves; in sequence we reduce the dimensionality of such features through principal component analysis, and in the end we applied super learning techniques (stacked ensemble methods) through an AutoML framework dedicated to optimize the parameters of several different machine learning models, better resolving the problem. As a final check, we obtained probability distribution functions (PDFs) using Gaussian kernel density estimations through the predictions of more than 50 models trained and optimized by AutoML. Those PDFs were calculated to replicate the original curves that used SALT2 model, a model used for the simulation of the raw data itself.

2022-12-23 — Relation of gait measures with mild unilateral knee pain during walking using machine learning

Authors: K. Bacon, D. Felson, S. Jafarzadeh, V. Kolachalama, Jeffrey M. Hausdorff, Eran Gazit, N. Segal, C. Lewis, M. Nevitt, Deepak Kumar
Year: 2022
Publication Date: 2022-12-23
Venue: Scientific Reports
DOI: 10.1038/s41598-022-21142-2
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Gait alterations in those with mild unilateral knee pain during walking may provide clues to modifiable alterations that affect progression of knee pain and osteoarthritis (OA). To examine this, we applied machine learning (ML) approaches to gait data from wearable sensors in a large observational knee OA cohort, the Multicenter Osteoarthritis (MOST) study. Participants completed a 20-m walk test wearing sensors on their trunk and ankles. Parameters describing spatiotemporal features of gait and symmetry, variability and complexity were extracted. We used an ensemble ML technique (“super learning”) to identify gait variables in our cross-sectional data associated with the presence/absence of unilateral knee pain. We then used logistic regression to determine the association of selected gait variables with odds of mild knee pain. Of 2066 participants (mean age 63.6 [SD: 10.4] years, 56% female), 21.3% had mild unilateral pain while walking. Gait parameters selected in the ML process as influential included step regularity, sample entropy, gait speed, and amplitude dominant frequency, among others. In adjusted cross-sectional analyses, lower levels of step regularity (i.e., greater gait variability) and lower sample entropy(i.e., lower gait complexity) were associated with increased likelihood of unilateral mild pain while walking [aOR 0.80 (0.64–1.00) and aOR 0.79 (0.66–0.95), respectively].

2022-12-22 — Hypothetical interventions on emergency ambulance and prehospital acetylsalicylic acid administration in myocardial infarction patients presenting without chest pain

Authors: A. Møller, H. Rytgaard, E.H.A Mills, H. Christensen, S. Blomberg, F. Folke, K. Kragholm, F. Lippert, G. Gislason, L. Køber, T. Gerds, C. Torp-Pedersen
Year: 2022
Publication Date: 2022-12-22
Venue: BMC Cardiovascular Disorders
DOI: 10.1186/s12872-022-03000-1
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Background Myocardial infarction (MI) patients presenting without chest pain are a diagnostic challenge. They receive suboptimal prehospital management and have high mortality. To elucidate potential benefits of improved management, we analysed expected outcome among non-chest pain MI patients if hypothetically they (1) received emergency ambulances/acetylsalicylic acid (ASA) as often as observed for chest pain patients, and (2) all received emergency ambulance/ASA. Methods We sampled calls to emergency and non-emergency medical services for patients hospitalized with MI within 24 h and categorized calls as chest pain/non-chest pain. Outcomes were 30-day mortality and a 1-year combined outcome of re-infarction, heart failure admission, and mortality. Targeted minimum loss-based estimation was used for all statistical analyses. Results Among 5418 calls regarding MI patients, 24% (1309) were recorded with non-chest pain. In total, 90% (3689/4109) of chest pain and 40% (525/1309) of non-chest pain patients received an emergency ambulance, and 73% (2668/3632) and 37% (192/518) of chest pain and non-chest pain patients received prehospital ASA. Providing ambulances to all non-chest pain patients was not associated with improved survival. Prehospital administration of ASA to all emergency ambulance transports of non-chest pain MI patients was expected to reduce 30-day mortality by 5.3% (CI 95%: [1.7%;9%]) from 12.8% to 7.4%. No significant reduction was found for the 1-year combined outcome (2.6% CI 95% [− 2.9%;8.1%]). In comparison, the observed 30-day mortality was 3% among ambulance-transported chest pain MI patients. Conclusions Our study found large differences in the prehospital management of MI patients with and without chest pain. Improved prehospital ASA administration to non-chest pain MI patients could possibly reduce 30-day mortality, but long-term effects appear limited. Non-chest pain MI patients are difficult to identify prehospital and possible unintended effects of ASA might outweigh the potential benefits of improving the prehospital management. Future research should investigate ways to improve the prehospital recognition of MI in the absence of chest pain.

2022-12-16 — Electronic Structure and Hardness of Mn3N2 Synthesized under High Temperature and High Pressure

Authors: Shoufeng Zhang, Chao Zhou, Guiqian Sun, Xin Wang, K. Bao, P. Zhu, Jin-ming Zhu, Zhaoqing Wang, Xingbin Zhao, Q. Tao, Yufei Ge, Tian Cui
Year: 2022
Publication Date: 2022-12-16
Venue: Metals
DOI: 10.3390/met12122164
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
The hardness of materials is a complicated physical quantity, and the hardness models that are widely used do not function well for transition metal light element (TMLE) compounds. The overestimation of actual hardness is a common phenomenon in hardness models. In this work, high-quality Mn3N2 bulk samples were synthesized under high temperature and high pressure (HTHP) to investigate this issue. The hardness of Mn3N2 was found to be 9.9 GPa, which was higher than the hardness predicted using Guo’s model of 7.01 GPa. Through the combination of the first-principle simulations and experimental analysis, it was found that the metal bonds, which are generally considered helpless to the hardness of crystals, are of importance when evaluating the hardness of TMLE compounds. Metal bonds were found to improve the hardness in TMLEs without strong covalent bonds. This work provides new considerations for the design and synthesis of high-hardness TMLE materials, which can be used to form wear-resistant coatings over the surfaces of typical alloy materials such as stainless steels. Moreover, our findings provide a basis for establishing a more comprehensive theoretical model of hardness in TMLEs, which will provide further insight to improve the hardness values of various alloys.

2022-12-05 — Adaptive Sequential Surveillance with Network and Temporal Dependence

Authors: I. Malenica, Jeremy Coyle, M. J. Laan, M. Petersen
Year: 2022
Publication Date: 2022-12-05
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Strategic test allocation plays a major role in the control of both emerging and existing pandemics (e.g., COVID-19, HIV). Widespread testing supports effective epidemic control by (1) reducing transmission via identifying cases, and (2) tracking outbreak dynamics to inform targeted interventions. However, infectious disease surveillance presents unique statistical challenges. For instance, the true outcome of interest - one's positive infectious status, is often a latent variable. In addition, presence of both network and temporal dependence reduces the data to a single observation. As testing entire populations regularly is neither efficient nor feasible, standard approaches to testing recommend simple rule-based testing strategies (e.g., symptom based, contact tracing), without taking into account individual risk. In this work, we study an adaptive sequential design involving n individuals over a period of {\tau} time-steps, which allows for unspecified dependence among individuals and across time. Our causal target parameter is the mean latent outcome we would have obtained after one time-step, if, starting at time t given the observed past, we had carried out a stochastic intervention that maximizes the outcome under a resource constraint. We propose an Online Super Learner for adaptive sequential surveillance that learns the optimal choice of tests strategies over time while adapting to the current state of the outbreak. Relying on a series of working models, the proposed method learns across samples, through time, or both: based on the underlying (unknown) structure in the data. We present an identification result for the latent outcome in terms of the observed data, and demonstrate the superior performance of the proposed strategy in a simulation modeling a residential university environment during the COVID-19 pandemic.

2022-12-01 — The Targeted Virtual Control Approach for Single-Arm Clinical Trials with External Controls

Authors: Yixin Fang, Sheng Zhong
Year: 2022
Publication Date: 2022-12-01
Venue: Statistics in Biopharmaceutical Research
DOI: 10.1080/19466315.2022.2154260
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract In single-arm clinical trials with external controls, usually the estimand of interest is defined as the average treatment effect on the treated (ATT), and external controls are leveraged to provide an estimator of the estimand. Recently, virtual control approaches have been proposed to predict the outcomes for experimental study subjects as if they were receiving the control treatment, resulting in so-called “virtual controls” for comparison with their observed outcomes under the experimental intervention. We consider the virtual control approaches within the causal inference framework, discussing the properties of the vanilla virtual control approach and the targeted virtual control approach. We illustrate via simulation that the targeted virtual control approach is doubly robust, whereas the vanilla virtual control approach is not. We demonstrate the targeted virtual control approach is the same as the targeted maximum likelihood estimator (TMLE) when targeted at the ATT estimand. Therefore, through the notion of virtual controls, we offer a tangible way to understand and interpret TMLE when targeted at the ATT estimand.

2022-12-01 — Forecasting domestic waste generation during successive COVID-19 lockdowns by Bidirectional LSTM super learner neural network

Authors: M. Jassim, G. Coskuner, N. Sultana, S. Hossain
Year: 2022
Publication Date: 2022-12-01
Venue: Applied Soft Computing
DOI: 10.1016/j.asoc.2022.109908
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2022-12-01 — EE147 Estimating the Causal Effect of Early Use of Erythropoietic Stimulating Agents in Intermediate-1 to Low-Risk MDS Patients: An Application of the Longitudinal Targeted Maximum Likelihood Estimation

Authors: Y. Zhang, N. Kreif, V. Gc, A. Bennett, A. Manca
Year: 2022
Publication Date: 2022-12-01
Venue: Value in Health
DOI: 10.1016/j.jval.2022.09.399
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2022-12-01 — 2238. Estimating the impact of antibiotic exposure on antibiotic resistance in uncomplicated UTI using machine learning causal inference

Authors: S. Kanjilal, Hyewon Jeong, Yidan Ma, Alex H. Wei, Kexin Yang, D. Sontag
Year: 2022
Publication Date: 2022-12-01
Venue: Open Forum Infectious Diseases
DOI: 10.1093/ofid/ofac492.1856
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Abstract Background Incorporating an antibiotic’s propensity for engendering resistance to itself and other antibiotics is a potentially useful strategy for preventing antimicrobial resistance (AMR), but prospective studies have been difficult to generalize to outpatients and retrospective studies are prone to design errors and model misspecification. To address this gap, we apply causal inference with targeted maximum likelihood estimation (TMLE) using machine learning, to data from the electronic health record to define the antibiotic use-resistance relationship for common outpatient therapies used to treat urinary tract infection (UTI). Methods We estimated the risk of AMR in response to treatment in a cohort of outpatients with uncomplicated UTI in the Mass General Brigham health system between 2016 and 2021. We sought to emulate a randomized controlled trial using the targeted maximum likelihood (TMLE) approach with logistic regression, random forests, multilayer perceptrons, and XGBoost to mitigate confounding by indication and to model the outcome (Figure 1). Potential confounders include demographics, comorbidities and prior microbiology, windowed in time for temporally varying features (Figure 2). We quantified the average treatment effect (ATE) of exposure to nitrofurantoin (NIT, target trial 1) or fluoroquinolones (FQs, target trial 2) to any other antibiotic type on the risk of AMR to NIT, FQs or amoxicillin-clavulanate at 12 months post-exposure. Figure 1: Analytic frameworkFramework for targeted maximum likelihood estimation (TMLE) of causal impacts.Figure 2: Causal diagrama) Conceptual model for the emergence of AMR with observed and unobserved features and b) causal diagram for inference model. Results Our final cohort consisted of 4,573 patients with no baseline AMR or antibiotic exposure in the previous 12 months who were treated with NIT, FQs or oral beta-lactams. XGBoost models significantly outperformed other model types. Compared to other antibiotics, the ATE of NIT exposure to NIT resistance at 12 months was 0.05 (0.04 – 0.07) and for FQ resistance was 0.06 (0.05, 0.08). Exposure to NIT had no impact on the risk of resistance to AMC at 12 months. Exposure to FQs had no impact on resistance to FQs, NIT or AMC at 12 months (Figure 3). Figure 3: Average treatment effectsATEs for the impact of NIT or FQs on the risk of AMR to NIT, FQs or AMC at 12 months using a) logistic regression, b) random forest, c) multilayer perceptron and d) XGBoost models Conclusion Outpatients treated with NIT had a higher risk of AMR at 12 months than those treated with FQs. Future work will focus on including hospital exposures and immunosuppression into models and infer impact using a wider range of treatments. Disclosures Sanjat Kanjilal, MD, MPH, GlaxoSmithKline: Advisor/Consultant|Roche Diagnostics: Honoraria David Sontag, PhD, Adobe: Grant/Research Support|ASAPP: Advisor/Consultant|Cureai Health: Stocks/Bonds|Facebook: Grant/Research Support|Google: Grant/Research Support|IBM: Grant/Research Support|SAP: Grant/Research Support|Takeda: Grant/Research Support.

2022-11-30 — Super Learner Ensemble for Sound Classification using Spectral Features

Authors: Luana Gantert, Matteo Sammarco, Marcin Detyniecki, Miguel Elias M. Campista
Year: 2022
Publication Date: 2022-11-30
Venue: IEEE Latin-American Conference on Communications
DOI: 10.1109/LATINCOM56090.2022.10000704
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Audio samples have emerged as a trend for monitoring and improving decision-making in smart cities, medical applications, and environmental event detections. This paper proposes a Super Learner ensemble application in two scenarios: to distinguish urban from domestic sounds, and detect abnormal samples in industrial machines. The Super Learner combines supervised classifiers to detect abnormal samples or determine a class of an event from spectral features extracted from original sounds. We study the impact on time processing and performance of varying the number of K-folds in the cross-validation step using the Environmental Sound Classification (ESC-50) and Malfunctioning Industrial Machine Investigation and Inspection (MIMII) datasets. The performance evaluation demonstrates that RF is the best classifier in the ESC-50 dataset and SVM in the MIMII dataset. However, the Super Learner reaches AUC and F1-Score values near the best algorithm in the majority of cases analyzed, representing the best tradeoff solution.

2022-11-28 — Change talk subtypes as predictors of alcohol use following brief motivational intervention.

Authors: C. Kahler, T. Janssen, S. Gruber, C. Howe, M. B. Laws, J. Walthers, M. Magill, N. Mastroleo, P. Monti
Year: 2022
Publication Date: 2022-11-28
Venue: Psychology of Addictive Behaviors
DOI: 10.1037/adb0000898
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
OBJECTIVE To examine the relative importance of client change language subtypes as predictors of alcohol use following motivational interviewing (MI). METHOD Participants were 164 heavy drinkers (57.3% female, Mage = 28.5 years, 13.4% Hispanic/Latinx, 82.9% White) recruited during an emergency department visit who received MI for alcohol and human immunodeficiency virus/sexual risk in a randomized-controlled trial. MI sessions were coded with the motivational interviewing skill code (MISC) and the generalized behavioral intervention analysis system (GBIAS). Variable importance analyses used targeted maximum likelihood estimation to rank order change language subtypes defined by these systems as predictors of alcohol use over 9 months of follow-up. RESULTS Among GBIAS change language subtypes, higher sustain talk (ST) around change planning was ranked the most important predictor of drinks per week (b = -5.57, 95% CI [-8.11, -3.02]) and heavy drinking days (b = -2.07, 95% CI [-3.17, -0.98]); this talk reflected (a) rejection of alcohol abstinence as a desired change goal, (b) rejection of specific change strategies, or (c) discussion of anticipated challenges in changing drinking. Among MISC change language subtypes, higher ST around taking steps-reflecting recent escalations in drinking described by a small minority of participants-was ranked the most important predictor of drinks per week (b = 22.71, 95% CI [20.29, 25.13]) and heavy drinking days (b = -2.45, 95% CI [1.68, 3.21]). CONCLUSIONS Results challenge the assumption that all ST during MI is a negative prognostic indicator and highlight the importance of the context in which change language emerges. (PsycInfo Database Record (c) 2022 APA, all rights reserved).

2022-11-22 — Efficient targeted learning of heterogeneous treatment effects for multiple subgroups

Authors: Waverly Wei, M. Petersen, M. J. van der Laan, Zeyu Zheng, Chong Wu, Jingshen Wang
Year: 2022
Publication Date: 2022-11-22
Venue: Biometrics
DOI: 10.1111/biom.13800
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
In biomedical science, analyzing treatment effect heterogeneity plays an essential role in assisting personalized medicine. The main goals of analyzing treatment effect heterogeneity include estimating treatment effects in clinically relevant subgroups and predicting whether a patient subpopulation might benefit from a particular treatment. Conventional approaches often evaluate the subgroup treatment effects via parametric modeling and can thus be susceptible to model mis‐specifications. In this paper, we take a model‐free semiparametric perspective and aim to efficiently evaluate the heterogeneous treatment effects of multiple subgroups simultaneously under the one‐step targeted maximum‐likelihood estimation (TMLE) framework. When the number of subgroups is large, we further expand this path of research by looking at a variation of the one‐step TMLE that is robust to the presence of small estimated propensity scores in finite samples. From our simulations, our method demonstrates substantial finite sample improvements compared to conventional methods. In a case study, our method unveils the potential treatment effect heterogeneity of rs12916‐T allele (a proxy for statin usage) in decreasing Alzheimer's disease risk.

2022-11-12 — Coupling Process-Based Models and Machine Learning Algorithms for Predicting Yield and Evapotranspiration of Maize in Arid Environments

Authors: A. Attia, A. Govind, A. S. Qureshi, T. Feike, M. Rizk, M. Shabana, A. Kheir
Year: 2022
Publication Date: 2022-11-12
Venue: Water
DOI: 10.3390/w14223647
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Crop yield prediction is critical for investigating the yield gap and potential adaptations to environmental and management factors in arid regions. Crop models (CMs) are powerful tools for predicting yield and water use, but they still have some limitations and uncertainties; therefore, combining them with machine learning algorithms (MLs) could improve predictions and reduce uncertainty. To that end, the DSSAT-CERES-maize model was calibrated in one location and validated in others across Egypt with varying agro-climatic zones. Following that, the dynamic model (CERES-Maize) was used for long-term simulation (1990–2020) of maize grain yield (GY) and evapotranspiration (ET) under a wide range of management and environmental factors. Detailed outputs from three growing seasons of field experiments in Egypt, as well as CERES-maize outputs, were used to train and test six machine learning algorithms (linear regression, ridge regression, lasso regression, K-nearest neighbors, random forest, and XGBoost), resulting in more than 1.5 million simulated yield and evapotranspiration scenarios. Seven warming years (i.e., 1991, 1998, 2002, 2005, 2010, 2013, and 2020) were chosen from a 31-year dataset to test MLs, while the remaining 23 years were used to train the models. The Ensemble model (super learner) and XGBoost outperform other models in predicting GY and ET for maize, as evidenced by R2 values greater than 0.82 and RRMSE less than 9%. The broad range of management practices, when averaged across all locations and 31 years of simulation, not only reduced the hazard impact of environmental factors but also increased GY and reduced ET. Moving beyond prediction and interpreting the outputs from Lasso and XGBoost, and using global and local SHAP values, we found that the most important features for predicting GY and ET are maximum temperatures, minimum temperature, available water content, soil organic carbon, irrigation, cultivars, soil texture, solar radiation, and planting date. Determining the most important features is critical for assisting farmers and agronomists in prioritizing such features over other factors in order to increase yield and resource efficiency values. The combination of CMs and ML algorithms is a powerful tool for predicting yield and water use in arid regions, which are particularly vulnerable to climate change and water scarcity.

2022-11-08 — Eligibility criteria vs. need for pre-exposure prophylaxis: a reappraisal among men who have sex with men in Amsterdam, the Netherlands

Authors: Feline de la Court, A. Boyd, U. Davidovich, E. Hoornenborg, M. F. Schim van der Loeff, H. D. de Vries, D. V. van Wees, B. V. van Benthem, M. Xiridou, A. Matser, M. Prins
Year: 2022
Publication Date: 2022-11-08
Venue: Epidemiology and Infection
DOI: 10.1017/S0950268822001741
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Abstract To reappraise pre-exposure prophylaxis (PrEP) eligibility criteria towards the men who have sex with men (MSM) with highest HIV-risk, we assessed PrEP need (i.e. HIV-risk) using Amsterdam Cohort Studies data from 2011–2017 for all non-PrEP using MSM. Outcomes were incident HIV-infection and newly-diagnosed anal STI. Determinants were current PrEP eligibility criteria (anal STI and condomless sex (CAS)) and additional determinants (age, education, group sex, alcohol use during sex and chemsex). We used targeted maximum likelihood estimation (TMLE) to estimate the relative risk (RR) and 95% confidence intervals (CI) of determinants on outcomes, and calculated population attributable fractions (PAFs) with 95% CI using RRs from TMLE. Among 810 included MSM, 22 HIV-infections and 436 anal STIs (n = 229) were diagnosed during follow-up. Chemsex (RR = 5.8 (95% CI 2.0–17.0); PAF = 55.3% (95% CI 43.3–83.4)), CAS with a casual partner (RR = 3.3 (95% CI 1.3–8.7); PAF = 38.0% (95% CI 18.3–93.6)) and anal STI (RR = 5.3 (95% CI 1.7–16.7); PAF = 22.0 (95% CI −16.8 to 100.0)) were significantly (P < 0.05) associated with and had highest attributable risk fractions for HIV. Chemsex (RR = 2.0 (95% CI 1.6–2.4); PAF = 19.5 (95% CI 10.6–30.6)) and CAS with a casual partner (RR = 2.5 (95% CI 2.0–3.0); PAF = 28.0 (95% CI 21.0–36.4)) were also significantly associated with anal STI, as was younger age (16–34/≥35; RR = 1.7 (95% CI 1.4–2.1); PAF = 15.5 (95% CI 6.4–27.6)) and group sex (RR = 1.3 (95% CI 1.1–1.6); PAF = 9.0 (95% CI −2.3 to 23.7)). Chemsex should be an additional PrEP eligibility criterion.

2022-11-08 — Abstract 14740: Racial Disparities in Adverse Pregnancy Outcomes and Cardiovascular Health After Delivery: A Mediation Analysis in Numom2b-hhs

Authors: Lucia C. Petito, Xiaoning Huang, C. B. Bairey Merz, N. Bello, J. Catov, Judith Chung, Abbi D. Lane-Cordova, D. Haas, L. Levine, R. McNeil, Eliza C. Miller, G. Saade, Lauren Theilen, Laura E Wiener, L. Yee, P. Greenland, W. Grobman, S. Khan
Year: 2022
Publication Date: 2022-11-08
Venue: Circulation
DOI: 10.1161/circ.146.suppl_1.14740
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Introduction: Racial disparities in risk of adverse pregnancy outcomes (APOs) and cardiovascular health (CVH) after delivery are well-established. Therefore, we sought to quantify the extent to which APOs explained racial differences in CVH. Methods: We included non-Hispanic (NH) Black and NH White individuals from the prospective, longitudinal nuMoM2b-HHS cohort. Race and ethnicity, which represent social constructs, were self-reported. APOs (hypertensive disorders of pregnancy [HDP], small for gestational age [SGA], preterm birth [PTB], and gestational diabetes mellitus [GDM]) were centrally adjudicated via medical records. The primary outcome was CVH score based on 4 metrics: body mass index, blood pressure, cholesterol, and glucose assessed at follow-up. Using a life-course approach, APOs were considered on the pathway between racial identity and CVH. Mean difference in CVH score was estimated via targeted maximum likelihood estimation (TMLE). Sensitivity analyses included alternative causal inference methods. All models were adjusted for age, study site, insurance, and fetal sex. Results: Among 2,987 birthing individuals, NH Black individuals were significantly more likely than NH White individuals to experience HDP, SGA, and PTB, and had lower (worse) CVH scores at follow-up (mean difference 0.52 [0.38-0.66]). Counterfactual disparity measures were estimated, which represent the adjusted racial difference in CVH score that would remain if no one experienced a particular APO ( Table ). Approximately 2% of the racial difference in CVH scores was due to racial differences in APOs when estimating with TMLE. Similar estimates were observed with other methods. Conclusions: A small proportion of the racial disparity in CVH years after delivery was mediated via APOs. Mitigating racial inequities in CVH will require interventions upstream of the first pregnancy (before an APO occurs) in addition to downstream risk factor control.

2022-11-07 — Targeted maximum likelihood estimation for causal inference in survival and competing risks analysis

Authors: H. Rytgaard, M. J. van der Laan
Year: 2022
Publication Date: 2022-11-07
Venue: Lifetime Data Analysis
DOI: 10.1007/s10985-022-09576-2
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2022-11-07 — Effect of a one-time financial incentive on linkage to chronic hypertension care in Kenya and Uganda: A randomized controlled trial

Authors: Matthew D. Hickey, A. Owaraganise, N. Sang, Fredrick J Opel, E. W. Mugoma, J. Ayieko, J. Kabami, G. Chamie, E. Kakande, M. Petersen, L. Balzer, M. Kamya, D. Havlir
Year: 2022
Publication Date: 2022-11-07
Venue: PLoS ONE
DOI: 10.1371/journal.pone.0277312
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Background Fewer than 10% of people with hypertension in sub-Saharan Africa are diagnosed, linked to care, and achieve hypertension control. We hypothesized that a one-time financial incentive and phone call reminder for missed appointments would increase linkage to hypertension care following community-based screening in rural Uganda and Kenya. Methods In a randomized controlled trial, we conducted community-based hypertension screening and enrolled adults ≥25 years with blood pressure ≥140/90 mmHg on three measures; we excluded participants with known hypertension or hypertensive emergency. The intervention was transportation reimbursement upon linkage (~$5 USD) and up to three reminder phone calls for those not linking within seven days. Control participants received a clinic referral only. Outcomes were linkage to hypertension care within 30 days (primary) and hypertension control <140/90 mmHg measured in all participants at 90 days (secondary). We used targeted minimum loss-based estimation to compute adjusted risk ratios (aRR). Results We screened 1,998 participants, identifying 370 (18.5%) with uncontrolled hypertension and enrolling 199 (100 control, 99 intervention). Reasons for non-enrollment included prior hypertension diagnosis (n = 108) and hypertensive emergency (n = 32). Participants were 60% female, median age 56 (range 27–99); 10% were HIV-positive and 42% had baseline blood pressure ≥160/100 mmHg. Linkage to care within 30 days was 96% in intervention and 66% in control (aRR 1.45, 95%CI 1.25–1.68). Hypertension control at 90 days was 51% intervention and 41% control (aRR 1.22, 95%CI 0.92–1.66). Conclusion A one-time financial incentive and reminder call for missed visits resulted in a 30% absolute increase in linkage to hypertension care following community-based screening. Financial incentives can improve the critical step of linkage to care for people newly diagnosed with hypertension in the community.

2022-11-04 — Exploring the Influencing Factors in Identifying Soil Texture Classes Using Multitemporal Landsat-8 and Sentinel-2 Data

Authors: Ya’nan Zhou, Wei Wu, Hongbin Liu
Year: 2022
Publication Date: 2022-11-04
Venue: Remote Sensing
DOI: 10.3390/rs14215571
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Soil texture is a key soil property driving physical, chemical, biological, and hydrological processes in soils. The rapid development of remote sensing techniques shows great potential for mapping soil properties. This study highlights the effectiveness of multitemporal remote sensing data in identifying soil textural class by using retrieved vegetation properties as proxies of soil properties. The impacts of sensors, modeling resolutions, and modeling techniques on the accuracy of soil texture classification were explored. Multitemporal Landsat-8 and Sentinel-2 images were individually acquired at the same time periods. Three satellite-based experiments with different inputs, i.e., Landsat-8 data, Sentinel-2 data (excluding red-edge parameters), and Sentinel-2 data (including red-edge parameters) were conducted. Modeling was carried out at three spatial resolutions (10, 30, 60 m) using five machine-learning (ML) methods: random forest, support vector machine, gradient-boosting decision tree, categorical boosting, and super learner that combined the four former classifiers based on the stacking concept. In addition, a novel SHapley Addictive Explanation (SHAP) technique was introduced to explain the outputs of the ML model. The results showed that the sensors, modeling resolutions, and modeling techniques significantly affected the prediction accuracy. The models using Sentinel-2 data with red-edge parameters performed consistently best. The models usually gave better results at fine (10 m) and medium (30 m) modeling resolutions than at a coarse (60 m) resolution. The super learner provided higher accuracies than other modeling techniques and gave the highest values of overall accuracy (0.8429), kappa (0.7611), precision (0.8378), recall rate (0.8393), and F1-score (0.8398) at 30 m with Sentinel-2 data involving red-edge parameters. The SHAP technique quantified the contribution of each variable for different soil textural classes, revealing the critical roles of red-edge parameters in separating loamy soils. This study provides comprehensive insights into the effective modeling of soil properties on various scales using multitemporal optical images.

2022-11-01 — Reprint of "Using multiple imputation by super learning to assign intent to nonfatal firearm injuries".

Authors: Thomas Carpenito, Matthew Miller, J. Manjourides, D. Azrael
Year: 2022
Publication Date: 2022-11-01
Venue: Preventive Medicine
DOI: 10.1016/j.ypmed.2022.107324
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Abstract unavailable from Semantic Scholar.

2022-11-01 — Ensemble super learner based genotoxicity prediction of multi-walled carbon nanotubes

Authors: B. Latha, Sheena Christabel Pravin, J. Saranya, E. Manikandan
Year: 2022
Publication Date: 2022-11-01
Venue: Computational Toxicology
DOI: 10.1016/j.comtox.2022.100244
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2022-11-01 — Assessment of Glucose Lowering Medications’ Effectiveness for Cardiovascular Clinical Risk Management of Real-World Patients with Type 2 Diabetes: Targeted Maximum Likelihood Estimation under Model Misspecification and Missing Outcomes

Authors: V. Sciannameo, G. Fadini, D. Bottigliengo, A. Avogaro, I. Baldi, D. Gregori, P. Berchialla
Year: 2022
Publication Date: 2022-11-01
Venue: International Journal of Environmental Research and Public Health
DOI: 10.3390/ijerph192214825
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
The results from many cardiovascular (CV) outcome trials suggest that glucose lowering medications (GLMs) are effective for the CV clinical risk management of type 2 diabetes (T2D) patients. The aim of this study is to compare the effectiveness of two GLMs (SGLT2i and GLP-1RA) for the CV clinical risk management of T2D patients in a real-world setting, by simultaneously reducing glycated hemoglobin, body weight, and systolic blood pressure. Data from the real-world Italian multicenter retrospective study Dapagliflozin Real World evideNce in Type 2 Diabetes (DARWINT 2D) are analyzed. Different statistical approaches are compared to deal with the real-world-associated issues, which can arise from model misspecification, nonrandomized treatment assignment, and a high percentage of missingness in the outcome, and can potentially bias the marginal treatment effect (MTE) estimate and thus have an influence on the clinical risk management of patients. We compare the logistic regression (LR), propensity score (PS)-based methods, and the targeted maximum likelihood estimator (TMLE), which allows for the use of machine learning (ML) models. Furthermore, a simulation study is performed, resembling the structure of the conditional dependencies among the main variables in DARWIN-T2D. LR and PS methods do not underline any difference in the effectiveness regarding the attainment of combined CV risk factor goals between the two treatments. TMLE suggests instead that dapagliflozin is significantly more effective than GLP-1RA for the CV risk management of T2D patients. The results from the simulation study suggest that TMLE has the lowest bias and SE for the estimate of the MTE.

2022-10-31 — Adaptive selection of the optimal strategy to improve precision and power in randomized trials

Authors: L. Balzer, E. Cai, Lucas Godoy Garraza, Pracheta Amaranath
Year: 2022
Publication Date: 2022-10-31
Venue: Biometrics
DOI: 10.1093/biomtc/ujad034
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
ABSTRACT Benkeser et al. demonstrate how adjustment for baseline covariates in randomized trials can meaningfully improve precision for a variety of outcome types. Their findings build on a long history, starting in 1932 with R.A. Fisher and including more recent endorsements by the U.S. Food and Drug Administration and the European Medicines Agency. Here, we address an important practical consideration: how to select the adjustment approach—which variables and in which form—to maximize precision, while maintaining Type-I error control. Balzer et al. previously proposed Adaptive Pre-specification within TMLE to flexibly and automatically select, from a prespecified set, the approach that maximizes empirical efficiency in small trials (N < 40). To avoid overfitting with few randomized units, selection was previously limited to working generalized linear models, adjusting for a single covariate. Now, we tailor Adaptive Pre-specification to trials with many randomized units. Using V-fold cross-validation and the estimated influence curve-squared as the loss function, we select from an expanded set of candidates, including modern machine learning methods adjusting for multiple covariates. As assessed in simulations exploring a variety of data-generating processes, our approach maintains Type-I error control (under the null) and offers substantial gains in precision—equivalent to 20%-43% reductions in sample size for the same statistical power. When applied to real data from ACTG Study 175, we also see meaningful efficiency improvements overall and within subgroups.

2022-10-18 — Uncovering Heterogeneous Associations Between Disaster-Related Trauma and Subsequent Functional Limitations: A Machine-Learning Approach.

Authors: K. Shiba, Adel Daoud, H. Hikichi, A. Yazawa, J. Aida, K. Kondo, I. Kawachi
Year: 2022
Publication Date: 2022-10-18
Venue: American Journal of Epidemiology
DOI: 10.1093/aje/kwac187
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
This study examined heterogeneity in the association between disaster-related home loss and functional limitations of older adults and identified characteristics of vulnerable sub-populations. Data were from a prospective cohort study of Japanese older survivors of the 2011 Japan Earthquake. Complete home loss was objectively assessed. Outcomes in 2013 (n=3,350) and 2016 (n=2,664) included certified physical disability levels, self-reported Activities of Daily Living, and Instrumental Activities of Daily Living. We estimated population average associations between home loss and functional limitations via targeted maximum likelihood estimation with SuperLearning and its heterogeneity via the generalized random forest algorithm. We adjusted for survivors' 55 characteristics from the baseline survey conducted seven months before the disaster. While home loss was consistently associated with increased functional limitations on average, there was evidence of effect heterogeneity for all outcomes. Comparing the most and least vulnerable groups, the most vulnerable group tended to be older, not married, living alone, and not working, with pre-existing health problems before the disaster. Individuals who were less educated but had higher income also appeared vulnerable for some outcomes. Our inductive approach for effect heterogeneity using machine learning algorithm uncovered large and complex heterogeneity in post-disaster functional limitations among Japanese older survivors.

2022-10-07 — Efficient and robust approaches for analysis of sequential multiple assignment randomized trials: Illustration using the ADAPT‐R trial

Authors: L. Montoya, Michael R. Kosorok, E. Geng, Joshua Schwab, T. Odeny, M. Petersen
Year: 2022
Publication Date: 2022-10-07
Venue: Biometrics
DOI: 10.1111/biom.13808
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Personalized intervention strategies, in particular those that modify treatment based on a participant's own response, are a core component of precision medicine approaches. Sequential multiple assignment randomized trials (SMARTs) are growing in popularity and are specifically designed to facilitate the evaluation of sequential adaptive strategies, in particular those embedded within the SMART. Advances in efficient estimation approaches that are able to incorporate machine learning while retaining valid inference can allow for more precise estimates of the effectiveness of these embedded regimes. However, to the best of our knowledge, such approaches have not yet been applied as the primary analysis in SMART trials. In this paper, we present a robust and efficient approach using targeted maximum likelihood estimation (TMLE) for estimating and contrasting expected outcomes under the dynamic regimes embedded in a SMART, together with generating simultaneous confidence intervals for the resulting estimates. We contrast this method with two alternatives (G‐computation and inverse probability weighting estimators). The precision gains and robust inference achievable through the use of TMLE to evaluate the effects of embedded regimes are illustrated using both outcome‐blind simulations and a real‐data analysis from the Adaptive Strategies for Preventing and Treating Lapses of Retention in Human Immunodeficiency Virus (HIV) Care (ADAPT‐R) trial (NCT02338739), a SMART with a primary aim of identifying strategies to improve retention in HIV care among people living with HIV in sub‐Saharan Africa.

2022-09-29 — A novel machine learning approach to predict the export price of seafood products based on competitive information: The case of the export of Vietnamese shrimp to the US market

Authors: Nguyen Minh Khiem, Yuki Takahashi, H. Yasuma, Khuu Thi Phuong Dong, T. N. Hải, N. Kimura
Year: 2022
Publication Date: 2022-09-29
Venue: PLoS ONE
DOI: 10.1371/journal.pone.0275290
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Predicting the export price of shrimp is important for Vietnam’s fisheries. It not only promotes product quality but also helps policy makers determine strategies to develop the national shrimp industry. Competition in global markets is considered to be an important factor, one that significantly influences price. In this study, we predicted trends in the export price of Vietnamese shrimp based on competitive information from six leading exporters (China, India, Indonesia, Thailand, Ecuador, and Chile) who, alongside Vietnam, also export shrimp to the US. The prediction was based on a dataset collected from the US Department of Agriculture (USDA), the Food and Agriculture Organization of the United Nations (FAO), and the World Trade Organization (WTO) (May-1995 to May-2019) that included price, required farming certificates, and disease outbreak data. A super learner technique, which combined 10 single algorithms, was used to make predictions in selected base periods (3, 6, 9, and 12 months). It was found that the super learner obtained results in all base periods that were more accurate and stable than any candidate algorithms. The impacts of variables in the predictive model were interpreted by a SHapley Additive exPlanations (SHAP) analysis to determine their influence on the price of Vietnamese exports. The price of Indian, Thai, and Chinese exports highlighted the advantages of being a World Trade Organization member and the disadvantages of the prevalence of shrimp disease in Vietnam, which has had a significant impact on the Vietnamese shrimp export price.

2022-09-27 — Influence of climate and environment on the efficacy of water, sanitation, and handwashing interventions on diarrheal disease in rural Bangladesh: a re-analysis of a randomized control trial

Authors: A. Nguyen, J. Grembi, M. Riviere, Gabriella Barratt Heitmann, D. William, Hutson, T. Athni, Arusha Patil, A. Ercumen, A. Lin, Y. Crider, Andrew, Mertens, L. Unicomb, Mahbubur Rahman, J. Colford, S. Luby, B. Arnold, J. Benjamin-Chung
Year: 2022
Publication Date: 2022-09-27
Venue: medRxiv
DOI: 10.1101/2022.09.25.22280229
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Background: Climate change may influence the effectiveness of environmental interventions. We investigated if climate and environment modified the effect of low-cost, point-of-use water, sanitation, and handwashing (WASH) interventions on diarrhea and predicted intervention effectiveness under climate change scenarios. Methods: We analyzed data from a cluster-randomized trial in rural Bangladesh that measured diarrhea prevalence in children 0-2 years from 2012-2016. We matched remote sensing data on temperature, precipitation, humidity, and surface water to households by location and measurement date. We estimated prevalence ratios (PR) for WASH interventions vs. control stratified by environmental factors using generative additive models and targeted maximum likelihood estimation. We estimated intervention effects under predicted precipitation in the study region in 2050 for climate change scenarios from different Shared Socioeconomic Pathways (SSPs). Findings: WASH interventions more effectively prevented diarrhea under higher levels of total precipitation in the previous week and when there was heavy rain in the previous week (heavy rainfall PR = 0.38, 95% CI 0.23-0.62 vs. no heavy rainfall PR = 0.77, 0.60-0.98). We did not detect substantial effect modification by other environmental variables. WASH intervention effectiveness increased under most climate change scenarios; in a fossil-fueled development scenario (SSP5), the PR was 0.46 (0.44-0.48) compared to 0.67 (0.65-0.68) in the study. Interpretation: WASH interventions had the strongest effect on diarrhea under higher precipitation, and effectiveness may increase under climate change without sustainable development. WASH interventions may improve population resilience to climate-related health risks. Funding: Bill & Melinda Gates Foundation, National Institute of Allergy and Infectious Diseases, National Heart, Lung, And Blood Institute

2022-09-23 — 'haldensify': Highly adaptive lasso conditional density estimation in 'R'

Authors: N. Hejazi, M. J. Laan, David C. Benkeser
Year: 2022
Publication Date: 2022-09-23
Venue: Journal of Open Source Software
DOI: 10.21105/joss.04522
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso

Abstract:
The haldensify R package serves as a toolbox for nonparametric conditional density estimation based on the highly adaptive lasso, a flexible nonparametric algorithm for the estimation of functional statistical parameters (e.g., conditional mean, hazard, density). Building upon an earlier proposal (Dı́az & van der Laan, 2011), haldensify leverages the relationship between the hazard and density functions to estimate the latter by applying pooled hazard regression to a synthetic repeated measures dataset created from the input data, relying upon the framework of cross-validated loss-based estimation to yield an optimal estimator (Dudoit & van der Laan, 2005; van der Laan et al., 2004). While conditional density estimation is a fundamental problem in statistics, arising naturally in a variety of applications (including machine learning), it plays a critical role in estimating the causal effects of continuousor ordinal-valued treatments. In such settings this covariate-conditional treatment density has been termed the generalized propensity score (Hirano & Imbens, 2004; Imai & Van Dyk, 2004), and, like its analog for binary treatments (Rosenbaum & Rubin, 1983), serves as a key ingredient in developing both inverse probability weighted and doubly robust estimators of causal effects (Dı́az & van der Laan, 2012, 2018; Haneuse & Rotnitzky, 2013; Hejazi et al., 2022).

2022-09-18 — Predicting cumulative lead (Pb) exposure using the Super Learner algorithm

Authors: Xin Wang, K. Bakulski, B. Mukherjee, Howard Hu, S. Park
Year: 2022
Publication Date: 2022-09-18
Venue: Chemosphere
DOI: 10.1016/j.chemosphere.2022.137125
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Chronic lead (Pb) exposure causes long term health effects. While recent exposure can be assessed by measuring blood lead (half-life 30 days), chronic exposures can be assessed by measuring lead in bone (half-life of many years to decades). Bone lead measurements, in turn, have been measured non-invasively in large population-based studies using x-ray fluorescence techniques, but the method remains limited due to technical availability, expense, and the need for licensing radioactive materials used by the instruments. Thus, we developed prediction models for bone lead concentrations using a flexible machine learning approach–Super Learner, which combines the predictions from a set of machine learning algorithms for better prediction performance. The study population included 695 men in the Normative Aging Study, aged 48 years and older, whose bone (patella and tibia) lead concentrations were directly measured using K-shell-X-ray fluorescence. Ten predictors (blood lead, age, education, job type, weight, height, body mass index, waist circumference, cumulative cigarette smoking (pack-year), and smoking status) were selected for patella lead and 11 (the same 10 predictors plus serum phosphorus) for tibia lead using the Boruta algorithm. We implemented Super Learner to predict bone lead concentrations by calculating a weighted combination of predictions from 8 algorithms. In the nested cross-validation, the correlation coefficients between measured and predicted bone lead concentrations were 0.58 for patella lead and 0.52 for tibia lead, which has improved the correlations obtained in previously-published linear regression-based prediction models. We evaluated the applicability of these prediction models to the National Health and Nutrition Examination Survey for the associations between predicted bone lead concentrations and blood pressure, and positive associations were observed. These bone lead prediction models provide reasonable accuracy and can be used to evaluate health effects of cumulative lead exposure in studies where bone lead is not measured.

2022-09-18 — Estimating spatiotemporally resolved PM2.5 concentration across the contiguous United States using Super learning

Authors: A. Shtein, J. Schwartz
Year: 2022
Publication Date: 2022-09-18
Venue: ISEE Conference Abstracts
DOI: 10.1289/isee.2022.p-0890
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Abstract unavailable from Semantic Scholar.

2022-09-15 — Productivity regression analysis of cutter suction dredger considering operating characteristics and equipment status

Authors: G. Shang, Liyun Xu, Jinzhu Tian, Dongwei Cai
Year: 2022
Publication Date: 2022-09-15
Venue: Proceedings of the Institution of Mechanical Engineers, Part M: Journal of Engineering for the Maritime Environment
DOI: 10.1177/14750902221121915
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
In order to optimize the operation parameters of cutter suction dredger in real time and adjust productivity as needed, a construction optimization strategy based on real-time productivity regression analysis is proposed. Machine learning methods, including Support Vector Regression (SVR), Gradient Boosting Regression Tree (GBRT), eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM) and a Super Learner that made up of them, are used to mine relevant features based on the big data of operation characteristics and equipment status. Firstly, the working principle of cutter suction dredger is analyzed, the features that need real-time monitoring are determined, and the above features are classified. Then, some missing values and outliers in the data are deleted. Next, Lasso method is used to eliminate the variables that are not related to the regression target, and the redundant variables are combined. In addition, five machine learning methods are used to train and test the off-line productivity data of cutter suction dredger. And they are used to fit recent online productivity data. Super Learner performed best, which achieved the highest R2 (0.917), the lowest RMSE (75.096) and MAE (61.422) in the five models for online regression. Furthermore, the calculation time of each model is discussed, and the feasibility of the method proposed in this study for real-time regression of online productivity data has been confirmed. Finally, the importance of characteristics is analyzed to provide guidance for dredging operations under restricted construction conditions. According to the regression results and the importance of features, operators can give priority to adjusting some features to adjust the real-time construction productivity of dredger.

2022-09-15 — Development and validation of prediction models for gestational diabetes treatment modality using supervised machine learning: a population-based cohort study

Authors: Lauren D. Liao, A. Ferrara, M. Greenberg, Amanda L. Ngo, Juanran Feng, Zhenhua Zhang, P. Bradshaw, A. Hubbard, Yeyi Zhu
Year: 2022
Publication Date: 2022-09-15
Venue: BMC Medicine
DOI: 10.1186/s12916-022-02499-7
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Gestational diabetes (GDM) is prevalent and benefits from timely and effective treatment, given the short window to impact glycemic control. Clinicians face major barriers to choosing effectively among treatment modalities [medical nutrition therapy (MNT) with or without pharmacologic treatment (antidiabetic oral agents and/or insulin)]. We investigated whether clinical data at varied stages of pregnancy can predict GDM treatment modality. Among a population-based cohort of 30,474 pregnancies with GDM delivered at Kaiser Permanente Northern California in 2007–2017, we selected those in 2007–2016 as the discovery set and 2017 as the temporal/future validation set. Potential predictors were extracted from electronic health records at different timepoints (levels 1–4): (1) 1-year preconception to the last menstrual period, (2) the last menstrual period to GDM diagnosis, (3) at GDM diagnosis, and (4) 1 week after GDM diagnosis. We compared transparent and ensemble machine learning prediction methods, including least absolute shrinkage and selection operator (LASSO) regression and super learner, containing classification and regression tree, LASSO regression, random forest, and extreme gradient boosting algorithms, to predict risks for pharmacologic treatment beyond MNT. The super learner using levels 1–4 predictors had higher predictability [tenfold cross-validated C-statistic in discovery/validation set: 0.934 (95% CI: 0.931–0.936)/0.815 (0.800–0.829)], compared to levels 1, 1–2, and 1–3 (discovery/validation set C-statistic: 0.683–0.869/0.634–0.754). A simpler, more interpretable model, including timing of GDM diagnosis, diagnostic fasting glucose value, and the status and frequency of glycemic control at fasting during one-week post diagnosis, was developed using tenfold cross-validated logistic regression based on super learner-selected predictors. This model compared to the super learner had only a modest reduction in predictability [discovery/validation set C-statistic: 0.825 (0.820–0.830)/0.798 (95% CI: 0.783–0.813)]. Clinical data demonstrated reasonably high predictability for GDM treatment modality at the time of GDM diagnosis and high predictability at 1-week post GDM diagnosis. These population-based, clinically oriented models may support algorithm-based risk-stratification for treatment modality, inform timely treatment, and catalyze more effective management of GDM.

2022-08-30 — Tree-based Subgroup Discovery In Electronic Health Records: Heterogeneity of Treatment Effects for DTG-containing Therapies

Authors: Jiabei Yang, Anne Mwangi, R. Kantor, I. Dahabreh, M. Nyambura, A. Delong, J. Hogan, J. Steingrimsson
Year: 2022
Publication Date: 2022-08-30
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
: The rich longitudinal individual level data available from electronic health records (EHRs) can be used to examine treatment effect heterogeneity. However, estimating treatment effects using EHR data poses several challenges, including time-varying confounding, repeated and temporally non-aligned measurements of covariates, treatment assignments and outcomes, and loss-to-follow-up due to dropout. Here, we develop the Subgroup Discovery for Longitudinal Data (SDLD) algorithm, a tree-based algorithm for discovering subgroups with heterogeneous treatment effects using longitudinal data by combining the generalized interaction tree algorithm, a general data-driven method for subgroup discovery, with longitudinal targeted maximum likelihood estimation. We apply the algorithm to EHR data to discover subgroups of people living with human immunodeficiency virus (HIV) who are at higher risk of weight gain when receiving dolutegravir-containing antiretroviral therapies (ARTs) versus when receiving non dolutegravir-containing ARTs. used in the analysis before and after imputation. For binary or ordinal covariates, we present the number and proportion of participants falling in the listed category; for continuous covariates, we present the mean and standard deviation. For covariates with no missing values, we present one summary for before and after imputation. ART refers to antiretroviral therapy; TB refers to tuberculosis; NHIF refers to National Health Insurance Fund.

2022-08-23 — Developing a Targeted Learning-Based Statistical Analysis Plan

Authors: Susan Gruber, Hana Lee, Rachael V. Phillips, M. Ho, M. J. van der Laan
Year: 2022
Publication Date: 2022-08-23
Venue: Statistics in Biopharmaceutical Research
DOI: 10.1080/19466315.2022.2116104
Link: Semantic Scholar
Matched Keywords: super learning, targeted minimum loss based estimation, tmle

Abstract:
Abstract The Targeted Learning estimation roadmap provides a rigorous framework for developing a statistical analysis plan (SAP) for synthesizing evidence from randomized controlled trials and real world data. Learning from these data necessitates acknowledging potential sources of bias, and specifying appropriate mitigation strategies. This article demonstrates how Targeted Learning informs different aspects of SAP development, including explicit representation of intercurrent events. Guiding principles are to (a) define the target parameter of interest separately from the model or estimation procedure; and (b) use targeted minimum loss-based estimation (TMLE) and super learning for causal inference. These flexible methodologies can be entirely pre-specified while remaining data adaptive; and (c) carry out a nonparametric sensitivity analysis to evaluate the plausibility of a causal interpretation of the estimated treatment effect, and its stability with respect to violations of underlying casual assumptions. The roadmap promotes the principles and practices set forth in the International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use Guideline. An annotated SAP, checklists for pre-specifying the TMLE and super learning procedures, and sample R code are provided as supplementary materials.

2022-08-19 — The Association Between Social Network Characteristics and Tuberculosis Infection Among Adults in Nine Rural Ugandan Communities.

Authors: C. Marquez, Yiqun T. Chen, Mucunguzi Atukunda, G. Chamie, L. Balzer, J. Kironde, E. Ssemmondo, F. Mwangwa, J. Kabami, A. Owaraganise, E. Kakande, Rachel Abbott, Bob Ssekyanzi, Catherine A Koss, M. Kamya, E. Charlebois, D. Havlir, M. Petersen
Year: 2022
Publication Date: 2022-08-19
Venue: Clinical Infectious Diseases
DOI: 10.1093/cid/ciac669
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
BACKGROUND Social network analysis can elucidate Tuberculosis (TB) transmission dynamics outside of the home and may inform novel network-based case-finding strategies. METHODS We assessed the association between social network characteristics and prevalent TB infection among residents (≥15 years) of 9 rural communities in Eastern Uganda. Social contacts named during a census were used to create community-specific non-household social networks. We evaluated whether social network structure and characteristics of first-degree contacts (gender, HIV status, TB infection) were associated with prevalent TB infection (positive TST) after adjusting for individual-level risk factors (age, gender, HIV status, TB contact, wealth, occupation, and BCG vaccination) with Targeted Maximum Likelihood Estimation. RESULTS Among 3,335 residents sampled for TST, 32% had a positive TST, 4% reported a TB contact. The social network contained 15,328 first-degree contacts. Persons with the most network centrality (top 10%) (aRR: 1.3 (1.1-1.1) and the most (top 10%) male contacts (aRR: 1.5 (95% CI: 1.3-1.9) had a higher risk of prevalent TB, compared to those in the remaining 90%. People with ≥1 contacts with HIV (aRR 1.3; 95% CI:1.1-1.6) and ≥2 contacts with TB infection were more likely to themselves have TB (aRR: 2.6; 95% CI: 2.2-2.9). CONCLUSIONS Social networks with higher centrality, more men, contacts with HIV, and TB infection, were positively associated with TB infection. TB transmission within measurable social networks may explain prevalent TB not associated with a household contact. Further study on network-informed TB case finding interventions is warranted.

2022-08-19 — Blurring cluster randomized trials and observational studies: Two-Stage TMLE for subsampling, missingness, and few independent units.

Authors: Joshua R Nugent, C. Marquez, E. Charlebois, Rachel Abbott, L. Balzer
Year: 2022
Publication Date: 2022-08-19
Venue: Biostatistics
DOI: 10.1093/biostatistics/kxad015
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation, tmle

Abstract:
Cluster randomized trials (CRTs) often enroll large numbers of participants; yet due to resource constraints, only a subset of participants may be selected for outcome assessment, and those sampled may not be representative of all cluster members. Missing data also present a challenge: if sampled individuals with measured outcomes are dissimilar from those with missing outcomes, unadjusted estimates of arm-specific endpoints and the intervention effect may be biased. Further, CRTs often enroll and randomize few clusters, limiting statistical power and raising concerns about finite sample performance. Motivated by SEARCH-TB, a CRT aimed at reducing incident tuberculosis infection, we demonstrate interlocking methods to handle these challenges. First, we extend Two-Stage targeted minimum loss-based estimation to account for three sources of missingness: (i) subsampling; (ii) measurement of baseline status among those sampled; and (iii) measurement of final status among those in the incidence cohort (persons known to be at risk at baseline). Second, we critically evaluate the assumptions under which subunits of the cluster can be considered the conditionally independent unit, improving precision and statistical power but also causing the CRT to behave like an observational study. Our application to SEARCH-TB highlights the real-world impact of different assumptions on measurement and dependence; estimates relying on unrealistic assumptions suggested the intervention increased the incidence of TB infection by 18% (risk ratio [RR]=1.18, 95% confidence interval [CI]: 0.85-1.63), while estimates accounting for the sampling scheme, missingness, and within community dependence found the intervention decreased the incident TB by 27% (RR=0.73, 95% CI: 0.57-0.92).

2022-08-16 — Self-Expandable Metal Stent as a Bridge to Surgery for Left-Sided Acute Malignant Colorectal Obstruction: Optimal Timing for Elective Surgery

Authors: Shuxian Chen, Sisi Zhou, Yiting Lin, Wenwen Xue, Zeyu Huang, Jing Yu, Zefeng Yu, Suzuan Chen
Year: 2022
Publication Date: 2022-08-16
Venue: Computational and Mathematical Methods in Medicine
DOI: 10.1155/2022/6015729
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Objectives This randomized, single-center, retrospective, comparative cohort study is aimed at investigating the optimal time interval from self-expandable metal stent (SEMS) placement to surgery and potential risk factors for complications in patients with acute malignant colorectal obstruction. Methods A total of 64 patients with left-sided acute malignant colorectal obstruction treated with SEMS placement and subsequent surgery between January 2013 and September 2020 were enrolled and allocated to a case group (SEMS placing time ≤ 14 days; n = 19 patients) and a control group (SEMS placing time > 14 days; n = 45 patients). The primary outcome was the difference in baseline information, patients' conditions during surgery, and postoperative conditions between the two groups. The secondary outcome included potential risk factors of postoperative complications. The propensity score matching (PSM) and super learner (SL) methods were used to eliminate multiple confounding factors of baseline data. A cohort of 21 samples was used for external validation, comprising 6 cases and 15 controls. Results A significant difference was observed between the two groups in intraoperative blood loss (P = 0.009), postoperative hospital stay (P = 0.002), postoperative complications (Clavien-Dindo grading ≥ II) (P < 0.001), stoma creation (P < 0.001), and primary anastomosis (P < 0.001). After a 1 : 3 PSM analysis, no statistically significant differences between eight confounding variables of the two groups were observed (P > 0.05). Caliper set as 0.2 multiple logistic regression analysis showed that the potential risk factor for postoperative complications was SEMS placing time (RR = 0.109, 95% confidence interval (CI) = 0.028-0.433; P = 0.002), indicating that SEMS placing time > 14 days was an independent risk factor for postoperative complications in bridge-to-surgery (BTS) setting. The area under the AUC curve was 76.7% and validated using the validation cohort. Conclusions Long duration of SEMS placement (>14 days) may not influence surgical difficulty but could increase the risk of postoperative complications.

2022-08-10 — Limited clinical utility of a machine learning revision prediction model based on a national hip arthroscopy registry

Authors: R. K. Martin, S. Wastvedt, Jeppe Lange, Ayoosh Pareek, J. Wolfson, B. Lund
Year: 2022
Publication Date: 2022-08-10
Venue: Knee Surgery, Sports Traumatology, Arthroscopy
DOI: 10.1007/s00167-022-07054-8
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Purpose Accurate prediction of outcome following hip arthroscopy is challenging and machine learning has the potential to improve our predictive capability. The purpose of this study was to determine if machine learning analysis of the Danish Hip Arthroscopy Registry (DHAR) can develop a clinically meaningful calculator for predicting the probability of a patient undergoing subsequent revision surgery following primary hip arthroscopy. Methods Machine learning analysis was performed on the DHAR. The primary outcome for the models was probability of revision hip arthroscopy within 1, 2, and/or 5 years after primary hip arthroscopy. Data were split randomly into training (75%) and test (25%) sets. Four models intended for these types of data were tested: Cox elastic net, random survival forest, gradient boosted regression (GBM), and super learner. These four models represent a range of approaches to statistical details like variable selection and model complexity. Model performance was assessed by calculating calibration and area under the curve (AUC). Analysis was performed using only variables available in the pre-operative clinical setting and then repeated to compare model performance using all variables available in the registry. Results In total, 5581 patients were included for analysis. Average follow-up time or time-to-revision was 4.25 years (± 2.51) years and overall revision rate was 11%. All four models were generally well calibrated and demonstrated concordance in the moderate range when restricted to only pre-operative variables (0.62–0.67), and when considering all variables available in the registry (0.63–0.66). The 95% confidence intervals for model concordance were wide for both analyses, ranging from a low of 0.53 to a high of 0.75, indicating uncertainty about the true accuracy of the models. Conclusion The association between pre-surgical factors and outcome following hip arthroscopy is complex. Machine learning analysis of the DHAR produced a model capable of predicting revision surgery risk following primary hip arthroscopy that demonstrated moderate accuracy but likely limited clinical usefulness. Prediction accuracy would benefit from enhanced data quality within the registry and this preliminary study holds promise for future model generation as the DHAR matures. Ongoing collection of high-quality data by the DHAR should enable improved patient-specific outcome prediction that is generalisable across the population. Level of evidence Level III.

2022-08-02 — Reciprocal perspective as a super learner improves drug-target interaction prediction (MUSDTI)

Authors: K. Dick, Daniel G. Kyrollos, Eric D. Cosoreanu, Joseph Dooley, Joshua S. Fryer, S. Gordon, Nikhil Kharbanda, M. Klamrowski, Patrick N. L. LaCasse, Thomas F. Leung, Muneeb Nasir, Chang Qiu, Aisha Robinson, Derek Shao, Boyan R. Siromahov, Evening Starlight, Christopher Tran, Christopher Wang, Yu-Kai Yang, J.R. Green
Year: 2022
Publication Date: 2022-08-02
Venue: Scientific Reports
DOI: 10.1038/s41598-022-16493-9
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
The identification of novel drug-target interactions (DTI) is critical to drug discovery and drug repurposing to address contemporary medical and public health challenges presented by emergent diseases. Historically, computational methods have framed DTI prediction as a binary classification problem (indicating whether or not a drug physically interacts with a given protein target); however, framing the problem instead as a regression-based prediction of the physiochemical binding affinity is more meaningful. With growing databases of experimentally derived drug-target interactions (e.g. Davis, Binding-DB, and Kiba), deep learning-based DTI predictors can be effectively leveraged to achieve state-of-the-art (SOTA) performance. In this work, we formulated a DTI competition as part of the coursework for a senior undergraduate machine learning course and challenged students to generate component DTI models that might surpass SOTA models and effectively combine these component models as part of a meta-model using the Reciprocal Perspective (RP) multi-view learning framework. Following 6 weeks of concerted effort, 28 student-produced component deep-learning DTI models were leveraged in this work to produce a new SOTA RP-DTI model, denoted the Meta Undergraduate Student DTI (MUSDTI) model. Through a series of experiments we demonstrate that (1) RP can considerably improve SOTA DTI prediction, (2) our new double-cold experimental design is more appropriate for emergent DTI challenges, (3) that our novel MUSDTI meta-model outperforms SOTA models, (4) that RP can improve upon individual models as an ensembling method, and finally, (5) RP can be utilized for low computation transfer learning. This work introduces a number of important revelations for the field of DTI prediction and sequence-based, pairwise prediction in general.

2022-08-02 — Performance of LTMLE in the Presence of Missing Data in Control-Matched Longitudinal Studies

Authors: Sue-Jane Wang, Zhipeng Huang, Hai Zhu
Year: 2022
Publication Date: 2022-08-02
Venue: Statistics in Biopharmaceutical Research
DOI: 10.1080/19466315.2022.2108136
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation, tmle

Abstract:
Abstract Conventional controlled trials employ randomization and blinding to ensure the balance of baseline covariates between study arms. Treatment effect is formally tested via a pre-specified hypothesis reflecting trial’s primary objective defined by the primary efficacy endpoint, such as, is experimental treatment effective in slowing cognitive decline at a pre-specified landmark time in a neurologic therapeutic development? To address the same clinical question, but, as a safety endpoint in an observational study post radiopharmaceutical imaging drug approval due to concerns of radiation retention in the brain after accumulated real-world use over time, the targeted minimum loss-based estimation (TMLE) method has been suggested. Overall, for a longitudinal control-matched cohort study, assessing treatment effect (efficacy or safety) in the presence of missingness can be very challenging, which depends also on trial duration and missingness pattern of outcome data between study arms. TMLE is a two-step procedure to estimate a target parameter and has been shown to be doubly robust. The objectives of our research are a few folds. First, we investigate the performance of the TMLE method using longitudinal structure (LTMLE) in a prospective control-matched longitudinal cohort study. Second, we evaluate the performance of LTMLE and a few methods that are often proposed in regulatory submissions for longitudinal data analysis or in the presence of missing data with various missing data mechanisms. Third, we assess the impacts of various missing data mechanisms to treatment effect estimation via extensive simulation studies. Finally, we discuss the results of the simulation studies and their implications to conducting a feasible real-world control-matched longitudinal cohort study.

2022-08-01 — Using Targeted Maximum Likelihood Estimation to Estimate Treatment Effect with Longitudinal Continuous or Binary Data: A Systematic Evaluation of 28 Diabetes Clinical Trials

Authors: Lingjing Jiang, Michael Rosenblum, Yu Du
Year: 2022
Publication Date: 2022-08-01
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Summary: The primary analysis of clinical trials in diabetes therapeutic area often involves a mixed-model repeated measure (MMRM) approach to estimate the average treatment effect for longitudinal continuous outcome, and a generalized linear mixed model (GLMM) approach for longitudinal binary outcome. In this paper, we considered another estimator of the average treatment effect, called targeted maximum likelihood estimator (TMLE). This estimator can be a one-step alternative to model either continuous or binary outcome. We compared those estimators by simulation studies and by analyzing real data from 28 diabetes clinical trials. The simulations involved different missing data scenarios, and the real data sets covered a wide range of possible distributions of the outcome and covariates in real-life clinical trials for diabetes drugs with different mechanisms of action. For all the settings, adjusted estimators tended to be more efficient than the unadjusted one. In the setting of longitudinal continuous outcome, the MMRM approach with visits and baseline variables interaction appeared to dominate the performance of the MMRM considering the main effects only for the baseline variables while showing better or comparable efficiency to the TMLE estimator in both simulations and data applications. For modeling longitudinal binary outcome, TMLE generally outperformed GLMM in terms of relative efficiency, and its avoidance of the cumbersome covariance fitting procedure from GLMM makes TMLE a more advantageous estimator. The

2022-08-01 — Using multiple imputation by super learning to assign intent to nonfatal firearm injuries.

Authors: Thomas Carpenito, Matthew Miller, Justin Manjourides, D. Azrael
Year: 2022
Publication Date: 2022-08-01
Venue: Preventive Medicine
DOI: 10.1016/j.ypmed.2022.107183
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Abstract unavailable from Semantic Scholar.

2022-08-01 — Super Learner Ensemble for Anomaly Detection and Cyber-Risk Quantification in Industrial Control Systems

Authors: Gabriela Ahmadi-Assalemi, Haider M. Al-Khateeb, Gregory Epiphaniou, Amar Aggoun
Year: 2022
Publication Date: 2022-08-01
Venue: IEEE Internet of Things Journal
DOI: 10.1109/jiot.2022.3144127
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Industrial control systems (ICSs) are integral parts of smart cities and critical to modern societies. Despite indisputable opportunities introduced by disruptor technologies, they proliferate the cybersecurity threat landscape, which is increasingly more hostile. The quantum of sensors utilized by ICS aided by artificial intelligence (AI) enables data collection capabilities to facilitate automation, process streamlining, and cost reduction. However, apart from the operational use, the sensors generated data combined with AI can be innovatively utilized to model anomalous behavior as part of layered security to increase resilience to cyberattacks. We introduce a framework to profile anomalous behavior in ICS and derive a cyber-risk score. A novel super learner ensemble for one-class classification is developed, using overlapping rolling windows with stratified, <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-fold, <inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula>-repeat cross-validation applied to each base learner followed by majority voting to derive the best learner. Our approach is demonstrated on a liquid distribution sensor data set. The experimental results reveal that the proposed technique achieves an overall <inline-formula> <tex-math notation="LaTeX">$F1$ </tex-math></inline-formula>-score of 99.13%, an anomalous recall score of 99% detecting anomalies lasting only 17 s. The key strength of the framework is the low computational complexity and error rate. The framework is modular, generic, applicable to other ICS, and transferable to other smart city sectors.

2022-08-01 — Estimating the effect of donor sex on red blood cell transfused patient mortality: A retrospective cohort study using a targeted learning and emulated trials-based approach

Authors: Peter Bruun-Rasmussen, P. Andersen, K. Banasik, S. Brunak, P. Johansson
Year: 2022
Publication Date: 2022-08-01
Venue: EClinicalMedicine
DOI: 10.1016/j.eclinm.2022.101628
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Summary Background Observational studies determining the effect of red blood cell (RBC) donor sex on recipient mortality have been inconsistent. Emulating hypothetical randomized target trials using large real-world data and targeted learning may clarify potential adverse effects. Methods In this retrospective cohort study, a RBC transfusion database from the Capital Region of Denmark comprising more than 900,000 transfusion events defined the observational data. Eligible patients were minimum 18 years, had received a leukocyte-reduced RBC transfusion, and had no history of RBC transfusions within the past year at baseline. The doubly robust targeted maximum likelihood estimation method coupled with ensembled machine learning was used to emulate sex-stratified target trials determining the comparative effectiveness of exclusively transfusing RBC units from either male or female donors. The outcome was all-cause mortality within 28 days of the baseline-transfusion. Estimates were adjusted for the total number of transfusions received on each day k, hospital of transfusion, calendar period, patient age and sex, ABO/RhD blood group of the patient, Charlson comorbidity score, the total number of transfusions received prior to day k, and the number of RBC units received on each day k from donors younger than 40 years of age. Findings Among 98,167 adult patients who were transfused between Jan. 1, 2008, and Apr. 10, 2018, a total of 90,917 patients (54.6% female) were eligible. For male patients, the 28-day survival was 2.06 percentage points (pp) (95 % confidence interval [CI]: 1.81-2.32, P<0.0001) higher under treatment with RBC units exclusively from male donors compared with exclusively from female donors. In female patients, exclusively transfusing RBC units from either male or female donors increased the 28-day survival with 0.64pp (0.52-0.76, P<0.0001), and 0.62pp (0.49-0.75, P<0.0001) compared with the current practice, respectively. No evidence of a sex-specific donor effect was found for female patients (0.02pp [-0.18-0.22]). The sensitivity analyses showed that a large unknown causal bias would have to be present to affect the conclusions. Interpretation The results suggest that a sex-matched transfusion policy may benefit patients. However, a causal interpretation of the findings relies on the assumption of no unmeasured confounding, treatment consistency, positivity, and minimal model misspecifications. Funding Novo Nordisk Foundation and the Innovation Fund Denmark.

2022-07-31 — Transfering Targeted Maximum Likelihood Estimation for Causal Inference into Sports Science

Authors: Talko B. Dijkhuis, F. Blaauw
Year: 2022
Publication Date: 2022-07-31
Venue: Entropy
DOI: 10.3390/e24081060
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Although causal inference has shown great value in estimating effect sizes in, for instance, physics, medical studies, and economics, it is rarely used in sports science. Targeted Maximum Likelihood Estimation (TMLE) is a modern method for performing causal inference. TMLE is forgiving in the misspecification of the causal model and improves the estimation of effect sizes using machine-learning methods. We demonstrate the advantage of TMLE in sports science by comparing the calculated effect size with a Generalized Linear Model (GLM). In this study, we introduce TMLE and provide a roadmap for making causal inference and apply the roadmap along with the methods mentioned above in a simulation study and case study investigating the influence of substitutions on the physical performance of the entire soccer team (i.e., the effect size of substitutions on the total physical performance). We construct a causal model, a misspecified causal model, a simulation dataset, and an observed tracking dataset of individual players from 302 elite soccer matches. The simulation dataset results show that TMLE outperforms GLM in estimating the effect size of the substitutions on the total physical performance. Furthermore, TMLE is most robust against model misspecification in both the simulation and the tracking dataset. However, independent of the method used in the tracking dataset, it was found that substitutes increase the physical performance of the entire soccer team.

2022-07-27 — Estimation of the Average Causal Effect in Longitudinal Data With Time-Varying Exposures: The Challenge of Non-Positivity and the Impact of Model Flexibility.

Authors: Jacqueline E. Rudolph, David C. Benkeser, Edward H. Kennedy, E. Schisterman, A. Naimi
Year: 2022
Publication Date: 2022-07-27
Venue: American Journal of Epidemiology
DOI: 10.1093/aje/kwac136
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
There are important challenges to the estimation and identification of average causal effects in longitudinal data with time-varying exposures. Here, we discuss the difficulty in meeting the positivity condition. Our motivating example is the per-protocol analysis of the Effects of Aspirin in Gestation and Reproduction trial. We estimated the average causal effect comparing incidence of pregnancy by 26 weeks had all women been assigned to aspirin and complied versus been assigned to placebo and complied. Using flexible targeted minimum loss-based estimation, we estimated a risk difference of 1.27% (95% CI: -9.83%, 12.38%). Using a less flexible inverse probability weighting approach, the risk difference was 5.77% (95% CI: -1.13%, 13.05%). However, the cumulative probability of compliance conditional on covariates approached zero as follow-up accrued, indicating a practical violation of the positivity assumption, which limited our ability to make causal interpretations. The effects of non-positivity were more apparent when using a more flexible estimator, as indicated by the greater imprecision. When faced with non-positivity, one can use a flexible approach and be transparent about the uncertainty, use a parametric approach and smooth over gaps in the data, or target a different estimand which will be less vulnerable to positivity violations.

2022-07-18 — Targeted maximum likelihood estimation of causal effects with interference: A simulation study

Authors: P. Zivich, M. Hudgens, M. Brookhart, J. Moody, David J Weber, A. Aiello
Year: 2022
Publication Date: 2022-07-18
Venue: Statistics in Medicine
DOI: 10.1002/sim.9525
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Interference, the dependency of an individual's potential outcome on the exposure of other individuals, is a common occurrence in medicine and public health. Recently, targeted maximum likelihood estimation (TMLE) has been extended to settings of interference, including in the context of estimation of the mean of an outcome under a specified distribution of exposure, referred to as a policy. This paper summarizes how TMLE for independent data is extended to general interference (network‐TMLE). An extensive simulation study is presented of network‐TMLE, consisting of four data generating mechanisms (unit‐treatment effect only, spillover effects only, unit‐treatment and spillover effects, infection transmission) in networks of varying structures. Simulations show that network‐TMLE performs well across scenarios with interference, but issues manifest when policies are not well‐supported by the observed data, potentially leading to poor confidence interval coverage. Guidance for practical application, freely available software, and areas of future work are provided.

2022-07-07 — Super learner machine‐learning algorithms for compressive strength prediction of high performance concrete

Authors: Seunghye Lee, Ngoc Hung Nguyen, Armagan Karamanli, Jaehong Lee, T. Vo
Year: 2022
Publication Date: 2022-07-07
Venue: Structural Concrete
DOI: 10.1002/suco.202200424
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Because the proportion between the compressive strength of high‐performance concrete (HPC) and its composition is highly nonlinear, more advanced regression methods are demanded to obtain better results. Super learner models, which are based on several ensemble methods including random forest regression (RFR), an adaptive boosting (AdaBoost), gradient boosting machine (GBM), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), categorical gradient Boosting (CatBoost), are used to solve this complicated problem. A grid search method is employed to determine the best set of hyper‐parameters of each ensemble algorithm. Two super learner models, which combine all six models or select the top three effective ones as the base learners, are then proposed to develop an accurate approach to estimate the compressive strength of HPC. The results on four popular datasets show significant improvement of the proposed super learner models in terms of prediction accuracy. It also reveals that their trained models always perform better than other methods since their errors (MAE, MSE, RMSE) are always much lower and values of R2 are higher than those of the previous studies. The proposed super learner models can be used to provide a reliable tool for mixture design optimization of the HPC.

2022-07-01 — Prediction of Global Psychological Stress and Coping Induced by the COVID-19 Outbreak: A Machine Learning Study

Authors: Neha Prerna Tigga, S. Garg
Year: 2022
Publication Date: 2022-07-01
Venue: ALPHA PSYCHIATRY
DOI: 10.5152/alphapsychiatry.2022.21797
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Background: Artificial intelligence and machine learning have enormous potential to deal efficiently with a wide range of issues that traditional sciences may be unable to address. Neuroscience, particularly psychiatry, is one of the domains that could potentially benefit from artificial intelligence and machine learning. This study aims to predict Stress and assess Coping with stress mechanisms during the COVID-19 pandemic and, therefore, help establish a successful intervention to manage distress. Methods: COVIDiSTRESS global survey data was used in this study and comprised 70 652 respondents after pre-processing. Binary classification is performed for predicting Stress and Coping with stress, while 2 ensemble machine learning algorithms, deep super learner and cascade deep forest, and state-of-the-art methods are explored for classification. Correlation attribute evaluation is used for feature significance. Statistical analysis, such as Cronbach’s alpha, demographic statistics, Pearson’s correlation coefficient, independent sample t-test, and 95% CI, is also performed. Results: Globally, females, the younger population, and those in COVID-19 risk groups are observed to possess higher levels of stress. Trust, Loneliness, and Distress are found to be the primary predictors of Stress, whereas the significant predictors for coping with stress are identified as Social Provision, Extroversion, and Agreeableness. Deep super learner and cascade deep forest outperformed the state-of-the-art methods with an accuracy of up to 88.42%. Conclusions: By comparing different classifiers, we can conclude that multi-layer ensemble outperforms all. Another aim of this study, is the ability to regulate demographic and negative psychological states with a goal of medical interventions and to work towards building multiple coping strategies to reduce stress and promote resilience and recovery from COVID-19.

2022-07-01 — P29 Comparing G-Computation, Propensity Score-Based Weighting, and Targeted Maximum Likelihood Estimation for Analyzing Externally Controlled Trials with an Unmeasured Confounder: A Simulation Study

Authors: J. Ren, P. Cislo, J. Cappelleri, P. Hlavacek, M. Dibonaventura
Year: 2022
Publication Date: 2022-07-01
Venue: Value in Health
DOI: 10.1016/j.jval.2022.04.039
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2022-07-01 — Long-Term Associations between Disaster-Related Home Loss and Health and Well-Being of Older Survivors: Nine Years after the 2011 Great East Japan Earthquake and Tsunami

Authors: K. Shiba, H. Hikichi, S. Okuzono, T. VanderWeele, Mariana C. Arcaya, Adel Daoud, R. Cowden, A. Yazawa, D. Zhu, J. Aida, K. Kondo, I. Kawachi
Year: 2022
Publication Date: 2022-07-01
Venue: Environmental Health Perspectives
DOI: 10.1289/EHP10903
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Background: Little research has examined associations between disaster-related home loss and multiple domains of health and well-being, with extended long-term follow-up and comprehensive adjustment for pre-disaster characteristics of survivors. Objectives: We examined the longitudinal associations between disaster-induced home loss and 34 indicators of health and well-being, assessed ∼9y post-disaster. Methods: We used data from a preexisting cohort study of Japanese older adults in an area directly impacted by the 2011 Japan Earthquake (n=3,350 and n=2,028, depending on the outcomes). The study was initiated in 2010, and disaster-related home loss status was measured in 2013 retrospectively. The 34 outcomes were assessed in 2020 and covered dimensions of physical health, mental health, health behaviors/sleep, social well-being, cognitive social capital, subjective well-being, and prosocial/altruistic behaviors. We estimated the associations between disaster-related home loss and the outcomes, using targeted maximum likelihood estimation and SuperLearner. We adjusted for pre-disaster characteristics from the wave conducted 7 months before the disaster (i.e., 2010), including prior outcome values that were available. Results: After Bonferroni correction for multiple testing, we found that home loss (vs. no home loss) was associated with increased posttraumatic stress symptoms (standardized difference=0.50; 95% CI: 0.35, 0.65), increased daily sleepiness (0.38; 95% CI: 0.21, 0.54), lower trust in the community (−0.36; 95% CI: −0.53, −0.18), lower community attachment (−0.60; 95% CI: −0.75, −0.45), and lower prosociality (−0.39; 95% CI: −0.55, −0.24). We found modest evidence for the associations with increased depressive symptoms, increased hopelessness, more chronic conditions, higher body mass index, lower perceived mutual help in the community, and decreased happiness. There was little evidence for associations with the remaining 23 outcomes. Discussion: Home loss due to a disaster may have long-lasting adverse impacts on the cognitive social capital, mental health, and prosociality of older adult survivors. https://doi.org/10.1289/EHP10903

2022-07-01 — Identifying risk factors associated with hepatitis C virus infection in participants in the national health and nutrition examination survey using Super Learner

Authors: L. Telep, Rachael Phillips, A. Chokkalingam
Year: 2022
Publication Date: 2022-07-01
Venue: Journal of Hepatology
DOI: 10.1016/s0168-8278(22)01444-1
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2022-06-30 — Targeted learning in observational studies with multi‐valued treatments: An evaluation of antipsychotic drug treatment safety

Authors: Jason Poulos, M. Horvitz-Lennon, Katya Zelevinsky, T. Cristea-Platon, Thomas Huijskens, Pooja Tyagi, Jiaju Yan, Jordi Diaz, Sharon-Lise T. Normand
Year: 2022
Publication Date: 2022-06-30
Venue: Statistics in Medicine
DOI: 10.1002/sim.10003
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
We investigate estimation of causal effects of multiple competing (multi‐valued) treatments in the absence of randomization. Our work is motivated by an intention‐to‐treat study of the relative cardiometabolic risk of assignment to one of six commonly prescribed antipsychotic drugs in a cohort of nearly 39 000 adults with serious mental illnesses. Doubly‐robust estimators, such as targeted minimum loss‐based estimation (TMLE), require correct specification of either the treatment model or outcome model to ensure consistent estimation; however, common TMLE implementations estimate treatment probabilities using multiple binomial regressions rather than multinomial regression. We implement a TMLE estimator that uses multinomial treatment assignment and ensemble machine learning to estimate average treatment effects. Our multinomial implementation improves coverage, but does not necessarily reduce bias, relative to the binomial implementation in simulation experiments with varying treatment propensity overlap and event rates. Evaluating the causal effects of the antipsychotics on 3‐year diabetes risk or death, we find a safety benefit of moving from a second‐generation drug considered among the safest of the second‐generation drugs to an infrequently prescribed first‐generation drug known for having low cardiometabolic risk.

2022-06-23 — Forecasting the cost of drought events in France by Super Learning

Authors: Geoffrey Ecoto, A. Chambaz
Year: 2022
Publication Date: 2022-06-23
Link: Semantic Scholar
Matched Keywords: super learner, super learning

Abstract:
Drought events are the second most expensive type of natural disaster within the French legal framework known as the natural disasters compensation scheme. In recent years, drought events have been remarkable in their geographical scale and intensity. We develop and apply a new methodology to forecast the cost of a drought event in France. The methodology hinges on Super Learning (van der Laan et al., 2007; Benkeser et al., 2018), a general aggregation strategy to learn a feature of the law of the data identified through an ad hoc risk function by relying on a library of algorithms. The algorithms either compete (discrete Super Learning) or collaborate (continuous Super Learning), with a cross-validation scheme determining the best performing algorithm or combination of algorithms, respectively. Our Super Learner takes into account the complex dependence structure induced in the data by the spatial and temporal nature of drought events.

2022-06-15 — Abstract 3348: Impact of comedications on pCR rates and relapse in breast cancer. Analysis of the Saint-Louis observational cohort

Authors: A. Hamy, A. Kassara, H. Hocini, Clémentine Garin, L. Teixeira, C. Cuvier, P. Gougis, Elise Dumas, F. Reyal, B. Grandal, Nadir Sella, Eric Daoud, A. Latouche, T. Dubois, Annabelle Ballesta, S. Alsafadi, E. D. Nery, Élodie Anthony, Benjamin Marande, C. De Bazelaire, A. de Roquancourt, C. Michel, S. Giacchetti, M. Espié
Year: 2022
Publication Date: 2022-06-15
Venue: Cancer Research
DOI: 10.1158/1538-7445.am2022-3348
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Context: There is a growing interest in drug repurposing and pathological complete response (pCR) that may influence the progression and treatment of breast cancer (BC). However, few studies focus on the influence of comedications, i.e., non-anticancerous drugs taken for coexisting conditions in cancer patients on neoadjuvant chemotherapy (NAC), and even less regarding the impact of response to treatment and relapse in breast cancer. Objectives: To assess whether the use of comedications modifies pCR and patient relapse probability in BC. Methods: We retrospectively analyzed data from Saint-Louis Hospital (Paris, France). Characteristics from 664 patients with neoadjuvant chemotherapy. Response to chemotherapy was assessed by pathological complete response (pCR). We analyzed comedication according to levels 1 and 2 of the Anatomical Therapeutic Chemical Classification System (ATC). A chronic comedication was defined by a comedication declared at diagnosis, excluding local and/or non-continuous administration. To estimate the average causal effect of comedication on pCR, we employed Inverse Probability Weighting (IPW) and Super Learner strategy to pick the best regression model. We used a Cox multivariate regression model to analyze the average causal effect of comedications on relapse probability. Results: 664 patients were included in this study. The median age at inclusion was 51.4 years. Of 664 patients, 194 patients (29.2%) had at least one comedication (433 total comedications). The repartition of comedications, according to the 1st level of ATC, was as follows: Cardiovascular system (C): 40.2% (n=174), Nervous system (N): 23.8% (n=103) and Alimentary tract and metabolism (A): 15.2% (n=66). Among the population with collected pCR, 112 tumors achieved pCR (18.6%). After IPW adjusted for clinical, pathological, and treatment variables, C03 (Diuretics) was associated with an increased likelihood of positive pCR (C03 versus no C03, OR = 5.0, CI95% [1.25-12.2]). By contrast, N06 (Psychoanaleptics) was associated with a decrease in pCR rates (OR= 0.3, CI95% [0.1-0.6]). The multivariate survival analysis showed a significant effect on the relapse probability of Selective Serotonin reuptake inhibitors (SSRIs, NO6) (OR = 2.3 CI95% [1.2-4.4], p = 0.01). Discussion: In this observational analysis, the use of chronic cardiovascular diuretics (C03) during NAC was associated with improvement of pCR rates. On the contrary, Psychoanaleptics (N06) were significantly associated with lower pCR rates and a higher probability of relapse. This finding prompts for further research on the interactions between chemotherapy, nervous system drugs such as SSRIs, and pathological complete response. Citation Format: Anne-Sophie Hamy, Amyn Kassara, Hamid Hocini, Clementine Garin, Luis Teixeira, Caroline Cuvier, Paul Gougis, Elise Dumas, Fabien Reyal, Beatriz Grandal, Nadir Sella, Eric Daoud, Aurélien Latouche, Thierry Dubois, Annabelle Ballesta, Samar Alsafadi, Elaine Del Nery, Élodie Anthony, Benjamin Marande, Cedric de Bazelaire, Anne de Roquancourt, Catherine Michel, Sylvie Giacchetti, Marc Espie. Impact of comedications on pCR rates and relapse in breast cancer. Analysis of the Saint-Louis observational cohort [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr 3348.

2022-06-06 — Estimators for the value of the optimal dynamic treatment rule with application to criminal justice interventions

Authors: L. Montoya, M. J. van der Laan, Jennifer L. Skeem, M. Petersen
Year: 2022
Publication Date: 2022-06-06
Venue: The International Journal of Biostatistics
DOI: 10.1515/ijb-2020-0128
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract Given an (optimal) dynamic treatment rule, it may be of interest to evaluate that rule – that is, to ask the causal question: what is the expected outcome had every subject received treatment according to that rule? In this paper, we study the performance of estimators that approximate the true value of: (1) an a priori known dynamic treatment rule (2) the true, unknown optimal dynamic treatment rule (ODTR); (3) an estimated ODTR, a so-called “data-adaptive parameter,” whose true value depends on the sample. Using simulations of point-treatment data, we specifically investigate: (1) the impact of increasingly data-adaptive estimation of nuisance parameters and/or of the ODTR on performance; (2) the potential for improved efficiency and bias reduction through the use of semiparametric efficient estimators; and, (3) the importance of sample splitting based on the cross-validated targeted maximum likelihood estimator (CV-TMLE) for accurate inference. In the simulations considered, there was very little cost and many benefits to using CV-TMLE to estimate the value of the true and estimated ODTR; importantly, and in contrast to non cross-validated estimators, the performance of CV-TMLE was maintained even when highly data-adaptive algorithms were used to estimate both nuisance parameters and the ODTR. In addition, we apply these estimators for the value of the rule to the “Interventions” study, an ongoing randomized controlled trial, to identify whether assigning cognitive behavioral therapy (CBT) to criminal justice-involved adults with mental illness using an ODTR significantly reduces the probability of recidivism, compared to assigning CBT in a non-individualized way.

2022-06-05 — MISL: Multiple imputation by super learning

Authors: Thomas Carpenito, J. Manjourides
Year: 2022
Publication Date: 2022-06-05
Venue: Statistical Methods in Medical Research
DOI: 10.1177/09622802221104238
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Multiple imputation techniques are commonly used when data are missing, however, there are many options one can consider. Multivariate imputation by chained equations is a popular method for generating imputations but relies on specifying models when imputing missing values. In this work, we introduce multiple imputation by super learning, an update to the multivariate imputation by chained equations method to generate imputations with ensemble learning. Ensemble methodologies have recently gained attention for use in inference and prediction as they optimally combine a variety of user-specified parametric and non-parametric models and perform well when estimating complex functions, including those with interaction terms. Through two simulations we compare inferences made using the multiple imputation by super learning approach to those made with other commonly used multiple imputation methods and demonstrate multiple imputation by super learning as a superior option when considering characteristics such as bias, confidence interval coverage rate, and confidence interval width.

2022-06-02 — Healthcare Data Security Using IoT Sensors Based on Random Hashing Mechanism

Authors: A. Khadidos, Shitharth Selvarajan, Alaa O. Khadidos, K. Sangeetha, K. Alyoubi
Year: 2022
Publication Date: 2022-06-02
Venue: J. Sensors
DOI: 10.1155/2022/8457116
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Providing security to the healthcare data stored in an IoT-cloud environment is one of the most challenging and demanding tasks in recent days. Because the IoT-cloud framework is constructed with an enormous number of sensors that are used to generate a massive amount of data, however, it is more susceptible to vulnerabilities and attacks, which degrades the security level of the network by performing malicious activities. Hence, Artificial Intelligence (AI) technology is the most suitable option for healthcare applications because it provides the best solution for improving the security and reliability of data. Due to this fact, various AI-based security mechanisms are implemented in the conventional works for the IoT-cloud framework. However, it faces significant problems of increased complexity in algorithm design, inefficient data handling, not being suitable for processing the unstructured data, increased cost of IoT sensors, and more time consumption. Therefore, this paper proposed an AI-based intelligent feature learning mechanism named Probabilistic Super Learning- (PSL-) Random Hashing (RH) for improving the security of healthcare data stored in IoT-cloud. Also, this paper is aimed at reducing the cost of IoT sensors by implementing the proposed learning model. Here, the training model has been maintained for detecting the attacks at the initial stage, where the properties of the reported attack are updated for learning the characteristics of attacks. In addition to that, the random key is generated based on the hash value of the data matrix, which is incorporated with the standard Elliptic Curve Cryptography (ECC) technique for data security. Then, the enhanced ECC-RH mechanism performs the data encryption and decryption processes with the generated random hash key. During performance evaluation, the results of both existing and proposed techniques are validated and compared using different performance indicators.

2022-06-01 — 737-P: Targeted Maximum Likelihood Estimation (TMLE) to Estimate the Effect of Liraglutide on Cardiovascular (CV) Outcomes in Race/Ethnicity Subgroups: Post Hoc Analysis of LEADER

Authors: David Chen, T. Abrahamsen, L. E. Dang, J. Lawson, R. Pratley
Year: 2022
Publication Date: 2022-06-01
Venue: Diabetes
DOI: 10.2337/db22-737-p
Link: Semantic Scholar
Matched Keywords: super learner, targeted maximum likelihood estimation, tmle

Abstract:
Important race/ethnicity subgroups at high risk of CV disease are underrepresented in type 2 diabetes CV outcomes trials, limiting the strength of conclusions drawn. Until data are available, TMLE may be more efficient than conventional analysis, giving a better estimation of treatment effects to support clinical decision-making. TMLE makes minimal assumptions on data distribution and prioritizes fitting the most salient parts of the data to the statistical estimand, thus producing more precise and less biased estimates. Relative cumulative risk (RR) of major adverse CV events at 4 years with liraglutide vs. placebo for race/ethnicity subgroups in LEADER was estimated by TMLE + super learner. Comparison to the original Cox regression hazard ratio (HR) and unadjusted Kaplan-Meier RR is shown. Although RR and HR are different estimands, for rare events and proportional hazards they can be numerically close. While the original HR was not significant in any race/ethnicity subgroups, TMLE led to statistically significant treatment effects in all but the ‘Black’ subgroup (Figure) . Statistical power to detect the full population RR in the ‘Black’ subgroup was 0.13 (vs. 0.70 in the ‘White’ subgroup) . These data provide reassurance on the beneficial effects of liraglutide vs. placebo on CV outcomes in underpowered race/ethnicity subgroups. D.Chen: None. T.J.Abrahamsen: Employee; Novo Nordisk A/S, Stock/Shareholder; Novo Nordisk A/S. L.E.E.Dang: Research Support; Novo Nordisk. J.Lawson: Employee; Novo Nordisk A/S. R.E.Pratley: Other Relationship; Bayer AG, Corcept Therapeutics, Dexcom, Inc., Hanmi Pharm. Co., Ltd., Merck & Co., Inc., Metavention, Novo Nordisk, Pfizer Inc., Poxel SA, Sanofi, Scohia Pharma Inc., Sun Pharmaceutical Industries Ltd. Novo Nordisk A/S

2022-05-29 — Comparison of a Target Trial Emulation Framework to Cox Regression to Estimate the Effect of Corticosteroids on COVID-19 Mortality

Authors: K. Hoffman, E. Schenck, M. Satlin, W. Whalen, Division W. Pan, Nicholas T Williams, Iván Díaz
Year: 2022
Publication Date: 2022-05-29
Venue: medRxiv
DOI: 10.1101/2022.05.27.22275037
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Background: Observational research provides a unique opportunity to learn causal effects when randomized trials are not available, but obtaining the correct estimates hinges on a multitude of design and analysis choices. We illustrate the advantages of modern causal inference methods and compare to standard research practice to estimate the effect of corticosteroids on mortality in hospitalized COVID-19 patients in an observational dataset. We use several large RCTs to benchmark our results. Methods: Our retrospective data source consists of 3,293 COVID-19 patients hospitalized at New York Presbyterian March 1-May 15, 2020. We design our study using the Target Trial Emulation framework. We estimate the effect of an intervention consisting of 6 days of corticosteroids administered at the time of severe hypoxia and contrast with an intervention consisting of no corticosteroids administration. The dataset includes dozens of time-varying confounders. We estimate the causal effects using a doubly robust estimator where the probabilities of treatment, outcome, and censoring are estimated using flexible regressions via super learning. We compare these analyses to standard practice in clinical research, consisting of two main methods: (i) Cox models for an exposure of corticosteroids receipt within various time windows of hypoxia, and (ii) a Cox time-varying model where the exposure is daily administration of corticosteroids starting at the time of hospitalization. Results: The effect in our target trial emulation is qualitatively identical to an RCT benchmark, estimated to reduce 28-day mortality from 32% (95% confidence interval: 31-34) to 23% (21-24). The estimated effect from meta-analyses of RCTs for corticosteroids is an odds ratio of 0.66 (0.53-0.82)(1). Hazard ratios from the Cox models range in size and direction from 0.50 (0.41-0.62) to 1.08 (0.80-1.47) and all study designs suffer from various forms of bias. Conclusion: We demonstrate in a case study that clinical research based on observational data can unveil true causal relations. However, the correctness of these effect estimates requires designing and analyzing the data based on principles which are different from the current standard in clinical research. The widespread communication and adoption of these design and analytical techniques is of high importance for the improvement of clinical research based on observational data.

2022-05-18 — Emulating a target trial of intensive nurse home visiting in the policy-relevant population using linked administrative data

Authors: M. Moreno-Betancur, J. Lynch, R. Pilkington, H. Schuch, Angela Gialamas, M. Sawyer, C. Chittleborough, S. Schurer, L. Gurrin
Year: 2022
Publication Date: 2022-05-18
Venue: International Journal of Epidemiology
DOI: 10.1093/ije/dyac092
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Abstract Background Populations willing to participate in randomized trials may not correspond well to policy-relevant target populations. Evidence of effectiveness that is complementary to randomized trials may be obtained by combining the ‘target trial’ causal inference framework with whole-of-population linked administrative data. Methods We demonstrate this approach in an evaluation of the South Australian Family Home Visiting Program, a nurse home visiting programme targeting socially disadvantaged families. Using de-identified data from 2004–10 in the ethics-approved Better Evidence Better Outcomes Linked Data (BEBOLD) platform, we characterized the policy-relevant population and emulated a trial evaluating effects on child developmental vulnerability at 5 years (n = 4160) and academic achievement at 9 years (n = 6370). Linkage to seven health, welfare and education data sources allowed adjustment for 29 confounders using Targeted Maximum Likelihood Estimation (TMLE) with SuperLearner. Sensitivity analyses assessed robustness to analytical choices. Results We demonstrated how the target trial framework may be used with linked administrative data to generate evidence for an intervention as it is delivered in practice in the community in the policy-relevant target population, and considering effects on outcomes years down the track. The target trial lens also aided in understanding and limiting the increased measurement, confounding and selection bias risks arising with such data. Substantively, we did not find robust evidence of a meaningful beneficial intervention effect. Conclusions This approach could be a valuable avenue for generating high-quality, policy-relevant evidence that is complementary to trials, particularly when the target populations are multiply disadvantaged and less likely to participate in trials.

2022-05-18 — A comprehensive investigation of statistical and machine learning approaches for predicting complex human diseases on genomic variants

Authors: Chonghao Wang, Jing Zhang, Xin Zhou, Lu Zhang
Year: 2022
Publication Date: 2022-05-18
Venue: bioRxiv
DOI: 10.1101/2022.05.16.492056
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Background Quantifying an individual’s risk for common diseases is an important goal of precision health. The polygenic risk score (PRS), which aggregates multiple risk alleles of candidate diseases, has emerged as a standard approach for identifying high-risk individuals. A variety of tools have been developed to implement PRS. However, benchmarks for comparatively evaluating the performance of these different methods and for assessing their potential to guide future clinical applications are lacking. Results We systematically validated and compared thirteen statistical methods, five machine learning models and two ensemble models using simulated data, twenty-two common diseases with internal training sets and four diseases with external summary statistics from the UK Biobank resource. The effects of disease heritability, single nucleotide polymorphism (SNP) effect size and sample size are evaluated using simulated data. We also investigated the correlations between methods and their standard deviations of different diseases. Conclusions In general, statistical methods outperform machine learning models, and ensemble models, such as Super Learner, generally perform the best for most situations. We observed the correlations were relatively high if the methods were from the same category and the external summary statistics from large cohort GWAS could decrease the standard deviation of method correlations. By varying three factors in the simulated data, we also identified that disease heritability had a strong effect on the predictive performance of individual methods. Both the number and effect sizes of risk SNPs are important; and while sample size strongly influences the performance of machine learning models, but not statistical methods.

2022-05-17 — Targeted Learning: Toward a Future Informed by Real-World Evidence

Authors: Susan Gruber, Rachael V. Phillips, Hana Lee, M. Ho, J. Concato, M. J. van der Laan
Year: 2022
Publication Date: 2022-05-17
Venue: Statistics in Biopharmaceutical Research
DOI: 10.1080/19466315.2023.2182356
Link: Semantic Scholar
Matched Keywords: super learning, targeted minimum loss based estimation

Abstract:
Abstract The 21st Century Cures Act of 2016 includes a provision for the U.S. Food and Drug Administration10.13039/100000038 (FDA) to evaluate the potential use of Real-World Evidence (RWE) to support new indications for use for previously approved drugs, and to satisfy post-approval study requirements. Extracting reliable evidence from Real-World Data (RWD) is often complicated by a lack of treatment randomization, potential intercurrent events, and informative loss to follow-up. Targeted Learning (TL) is a sub-field of statistics that provides a rigorous framework to help address these challenges. The TL Roadmap offers a step-by-step guide to generating valid evidence and assessing its reliability. Following these steps produces an extensive amount of information for assessing whether the study provides reliable scientific evidence, including in support of regulatory decision-making. This article presents two case studies that illustrate the utility of following the roadmap. We used targeted minimum loss-based estimation combined with super learning to estimate causal effects. We also compared these findings with those obtained from an unadjusted analysis, propensity score matching, and inverse probability weighting. Nonparametric sensitivity analyses illuminate how departures from (untestable) causal assumptions affect point estimates and confidence interval bounds that would impact the substantive conclusion drawn from the study. TL’s thorough approach to learning from data provides transparency, allowing trust in RWE to be earned whenever it is warranted.

2022-05-13 — A Huber loss-based super learner with applications to healthcare expenditures

Authors: Ziyue Wu, David Benkeser
Year: 2022
Publication Date: 2022-05-13
Venue: arXiv.org
DOI: 10.48550/arXiv.2205.06870
Link: Semantic Scholar
Matched Keywords: super learner, super learning

Abstract:
Complex distributions of the healthcare expenditure pose challenges to statistical modeling via a single model. Super learning, an ensemble method that combines a range of candidate models, is a promising alternative for cost estimation and has shown benefits over a single model. However, standard approaches to super learning may have poor performance in settings where extreme values are present, such as healthcare expenditure data. We propose a super learner based on the Huber loss, a “robust” loss function that combines squared error loss with absolute loss to down-weight the influence of outliers. We derive oracle inequalities that establish bounds on the finite-sample and asymptotic performance of the method. We show that the proposed method can be used both directly to optimize Huber risk, as well as in finite-sample settings where optimizing mean squared error is the ultimate goal. For this latter scenario, we provide two methods for performing a grid search for values of the robustification parameter indexing the Huber loss. Simulations and real data analysis demonstrate appreciable finite-sample gains in cost prediction and causal effect estimation using our proposed method. include the following measures from patient self-reported questionnaires collected at baseline: (1) Socio-demographics (age, sex, employment status, etc.); (2) Pain-related characteristics (back/leg pain duration, back/leg pain intensity, modified Roland-Morris Disability Questionnaire, Brief Pain In-ventory Activity Interference Scale); (3) PHQ-4 measure of anxiety and depressive symptoms; (4) European Quality of Life 5 Dimension (EQ5D) index and Visual Analog Scale; (5) Number of falls; and (6) Recovery expectation. Besides, we also include the Quan comorbidity score, baseline diagnosis, and total RVUs at one year before index visit from EHR as covariates.

2022-05-02 — Mousika: Enable General In-Network Intelligence in Programmable Switches by Knowledge Distillation

Authors: Guorui Xie, Qing Li, Yutao Dong, Guang-Hua Duan, Yong Jiang, Jingpu Duan
Year: 2022
Publication Date: 2022-05-02
Venue: IEEE Conference on Computer Communications
DOI: 10.1109/INFOCOM48880.2022.9796936
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Given the power efficiency and Tbps throughput of packet processing, several works are proposed to offload the decision tree (DT) to programmable switches, i.e., in-network intelligence. Though the DT is suitable for the switches’ match-action paradigm, it has several limitations. E.g., its range match rules may not be supported well due to the hardware diversity; and its implementation also consumes lots of switch resources (e.g., stages and memory). Moreover, as learning algorithms (particularly deep learning) have shown their superior performance, some more complicated learning models are emerging for networking. However, their high computational complexity and large storage requirement are cause challenges in the deployment on switches. Therefore, we propose Mousika, an in-network intelligence framework that addresses these drawbacks successfully. First, we modify the DT to the Binary Decision Tree (BDT). Compared with the DT, our BDT supports faster training, generates fewer rules, and satisfies switch constraints better. Second, we introduce the teacher-student knowledge distillation in Mousika, which enables the general translation from other learning models to the BDT. Through the translation, we can not only utilize the super learning capabilities of complicated models, but also avoid the computation/memory constraints when deploying them on switches directly for line-speed processing.

2022-05-02 — METHOD SUPER LEARNING FOR DETERMINATION OF MOLECULAR RELATIONSHIP

Authors: A. Gurbych
Year: 2022
Publication Date: 2022-05-02
Venue: Herald of Khmelnytskyi National University. Technical sciences
DOI: 10.31891/2307-5732-2022-307-2-14-24
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
This paper uses the Super Learning principle to predict the molecular affinity between the receptor (large biomolecule) and ligands (small organic molecules). Meta-models study the optimal combination of individual basic models in two consecutive ensembles – classification and regression. Each costume contains six models of machine learning, which are combined by stacking. Base models include the reference vector method, random forest, gradient boosting, neural graph networks, direct propagation, and transformers. The first ensemble predicts binding probability and classifies all candidate molecules to the selected receptor into active and inactive. Ligands recognized as involved by the first ensemble are fed to the second ensemble, which assumes the degree of their affinity for the receptor in the form of an inhibition factor (Ki). A feature of the method is the rejection of the use of atomic coordinates of individual molecules and their complexes – thus eliminating experimental errors in sample preparation and measurement of nuclear coordinates and the method to determine the affinity of biomolecules with unknown spatial configurations. It is shown that meta-learning increases the response (Recall) of the classification ensemble by 34.9% and the coefficient of determination (R2) of the regression ensemble by 21% compared to the average values. This paper shows that an ensemble with meta-stacking is an asymptotically optimal system for learning. The feature of Super Learning is to use k-fold cross-validation to form first-level predictions that teach second-level models — or meta-models — that combine first-level models optimally. The ability to predict the molecular affinity of six machine learning models is studied, and the efficiency improvement is due to the combination of models in the ensemble by the stacking method. Models that are combined into two consecutive ensembles are shown.

2022-04-15 — Machine-learning derived algorithms for prediction of radiographic progression in early axial spondyloarthritis.

Authors: R. Garofoli, M. Resche-Rigon, C. Roux, D. van der Heijde, M. Dougados, A. Moltó
Year: 2022
Publication Date: 2022-04-15
Venue: Clinical and Experimental Rheumatology
DOI: 10.55563/clinexprheumatol/mm2uzu
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
OBJECTIVES To compare machine learning (ML) to traditional models to predict radiographic progression in patients with early axial spondyloarthritis (axSpA). METHODS We carried out a prospective French multicentric DESIR cohort study with 5 years of follow-up that included patients with chronic back pain for <3 years, suggestive of axSpA. Radiographic progression was defined as progression at the spine (increase of at least 1 point of mSASSS scores/2 years) or at the sacroiliac joint (worsening of at least one grade of the mNY score between 2 visits). Statistical analyses were based on patients without any missing data regarding the outcome and variables of interest (295 patients).Traditional modelling: we performed a multivariate logistic regression model (M1); then variable selection with stepwise selection based on Akaike Information Criterion (stepAIC) method (M2), and Least Absolute Shrinkage and Selection Operator (LASSO) method (M3).ML modelling: using "SuperLearner" package on R, we modelled radiographic progression with stepAIC, LASSO, random forest, Discrete Bayesian Additive Regression Trees Samplers (DBARTS), Generalized Additive Models (GAM), multivariate adaptive polynomial spline regression (polymars), Recursive Partitioning And Regression Trees (RPART) and Super Learner. Accuracy of these models was compared based on their 10-fold cross-validated AUC (cv-AUC). RESULTS 10-fold cv-AUC for traditional models were 0.79 and 0.78 for M2 and M3, respectively. The three best models in the ML algorithms were the GAM, the DBARTS and the Super Learner models, with 10-fold cv-AUC of: 0.77, 0.76 and 0.74, respectively. CONCLUSIONS Two traditional models predicted radiographic progression as good as the eight ML models tested in this population.

2022-04-13 — Practical considerations for specifying a super learner.

Authors: Rachael V. Phillips, M. J. van der Laan, Hana Lee, Susan Gruber
Year: 2022
Publication Date: 2022-04-13
Venue: International Journal of Epidemiology
DOI: 10.1093/ije/dyad023
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Common tasks encountered in epidemiology, including disease incidence estimation and causal inference, rely on predictive modelling. Constructing a predictive model can be thought of as learning a prediction function (a function that takes as input covariate data and outputs a predicted value). Many strategies for learning prediction functions from data (learners) are available, from parametric regressions to machine learning algorithms. It can be challenging to choose a learner, as it is impossible to know in advance which one is the most suitable for a particular dataset and prediction task. The super learner (SL) is an algorithm that alleviates concerns over selecting the one 'right' learner by providing the freedom to consider many, such as those recommended by collaborators, used in related research or specified by subject-matter experts. Also known as stacking, SL is an entirely prespecified and flexible approach for predictive modelling. To ensure the SL is well specified for learning the desired prediction function, the analyst does need to make a few important choices. In this educational article, we provide step-by-step guidelines for making these decisions, walking the reader through each of them and providing intuition along the way. In doing so, we aim to empower the analyst to tailor the SL specification to their prediction task, thereby ensuring their SL performs as well as possible. A flowchart provides a concise, easy-to-follow summary of key suggestions and heuristics, based on our accumulated experience and guided by SL optimality theory.

2022-03-31 — Exploring modern machine learning methods to improve causal-effect estimation

Authors: Yeji Kim, Tae-Kil Choi, Sangbum Choi
Year: 2022
Publication Date: 2022-03-31
Venue: Communications for Statistical Applications and Methods
DOI: 10.29220/csam.2022.29.2.177
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
This paper addresses the use of machine learning methods for causal estimation of treatment e ff ects from observational data. Even though conducting randomized experimental trials is a gold standard to reveal potential causal relationships, observational study is another rich source for investigation of exposure e ff ects, for example, in the research of comparative e ff ectiveness and safety of treatments, where the causal e ff ect can be identified if covariates contain all confounding variables. In this context, statistical regression models for the expected outcome and the probability of treatment are often imposed, which can be combined in a clever way to yield more e ffi cient and robust causal estimators. Recently, targeted maximum likelihood estimation and causal random forest is proposed and extensively studied for the use of data-adaptive regression in estimation of causal inference parameters. Machine learning methods are a natural choice in these settings to improve the quality of the final estimate of the treatment e ff ect. We explore how we can adapt the design and training of several machine learning algorithms for causal inference and study their finite-sample performance through simulation experiments under various scenarios. Application to the percutaneous coronary intervention (PCI) data shows that these adaptations can improve simple linear regression-based methods.

2022-03-23 — Explaining Practical Differences Between Treatment Effect Estimators with High Dimensional Asymptotics

Authors: Steve Yadlowsky
Year: 2022
Publication Date: 2022-03-23
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
We revisit the classical causal inference problem of estimating the average treatment effect in the presence of fully observed confounding variables using two-stage semiparametric methods. In existing theoretical studies of methods such as G-computation, inverse propensity weighting (IPW), and two common doubly robust estimators -- augmented IPW (AIPW) and targeted maximum likelihood estimation (TMLE) -- they are either bias-dominated, or have similar asymptotic statistical properties. However, when applied to real datasets, they often appear to have notably different variance. We compare these methods when using a machine learning (ML) model to estimate the nuisance parameters of the semiparametric model, and highlight some of the important differences. When the outcome model estimates have little bias, which is common among some key ML models, G-computation and the TMLE outperforms the other estimators in both bias and variance. We show that the differences can be explained using high-dimensional statistical theory, where the number of confounders $d$ is of the same order as the sample size $n$. To make this theoretical problem tractable, we posit a generalized linear model for the effect of the confounders on the treatment assignment and outcomes. Despite making parametric assumptions, this setting is a useful surrogate for some machine learning methods used to adjust for confounding in two-stage semiparametric methods. In particular, the estimation of the first stage adds variance that does not vanish, forcing us to confront terms in the asymptotic expansion that normally are brushed aside as finite sample defects. However, our model emphasizes differences in performance between these estimators beyond first-order asymptotics.

2022-03-19 — CausalDeepCENT: Deep Learning for Causal Prediction of Individual Event Times

Authors: Jong-Hyeon Jeong, Yichen Jia
Year: 2022
Publication Date: 2022-03-19
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Deep learning (DL) has recently drawn much attention in image analysis, natural language process, and high-dimensional medical data analysis. Under the causal direct acyclic graph (DAG) interpretation, the input variables without incoming edges from parent nodes in the DL architecture maybe assumed to be randomized and independent of each other. As in a regression setting, including the input variables in the DL algorithm would reduce the bias from the potential confounders. However, failing to include a potential latent causal structure among the input variables affecting both treatment assignment and the output variable could be additional significant source of bias. The primary goal of this study is to develop new DL algorithms to estimate causal individual event times for time-to-event data, equivalently to estimate the causal time-to-event distribution with or without right censoring, accounting for the potential latent structure among the input variables. Once the causal individual event times are estimated, it would be straightforward to estimate the causal average treatment effects as the differences in the averages of the estimated causal individual event times. A connection is made between the proposed method and the targeted maximum likelihood estimation (TMLE). Simulation studies are performed to assess improvement in prediction abilities of the proposed methods by using the mean square error (MSE)-based method and rank-based C-Index metric. The simulation results indicate that improvement on the prediction accuracy could be substantial particularly when there is a collider among the input variables. The proposed method is illustrated with a publicly available and influential breast cancer data set. The proposed method has been implemented by using PyTorch and uploaded at https://github.com/yicjia/CausalDeepCENT.

2022-03-16 — Association between Preoperative hs-CRP/Albumin Ratio and Postoperative SIRS in Elderly Patients: A Retrospective Observational Cohort Study

Authors: C. Chen, X. Chen, J. Chen, J. Xing, Z. Hei, Q. Zhang, Z. Liu, Shaoli Zhou
Year: 2022
Publication Date: 2022-03-16
Venue: The Journal of Nutrition, Health & Aging
DOI: 10.1007/s12603-022-1761-4
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Systemic inflammatory response syndrome (SIRS) is one of the severe postoperative complications in elderly patients and seriously affects their prognosis and survival rate. Heretofore, there have been no reliable and accurate methods to predict postoperative SIRS in elderly patients. The aim of this study was to determine whether increased preoperative hs-CRP/albumin ratio (CAR) was associated with postoperative SIRS in elderly population. The data of patients aged ≥ 65 years who underwent general anesthesia in two centers of Third Affiliated Hospital of Sun Yat-sen University between January 2015 and September 2020 were retrieved and analyzed. Based on the perioperative dataset, we used the targeted maximum likelihood estimation (TMLE) to estimate the association between preoperative CAR and postoperative SIRS in elderly population. Patients’ CAR was calculated and divided into two groups (< 0.278 and ≥ 0.278) according to its normal range in our hospital. Adjusted odd ratios (aORs) and 95% confidence intervals (CIs) were calculated respectively. Further sensitivity analyses were conducted to evaluate the robustness of the results. A total of 16141 elderly patients were accessed and 7009 of them were enrolled in the final analysis, and 1674 (23.9%) patients developed SIRS within 3 days after surgery. Compared with non-SIRS patients, patients with SIRS had a significantly longer postoperative hospitalization, higher cost and higher risk of in-hospital mortality. Compared with patients with preoperative CAR < 0.278, we found that CAR ≥ 0.278 had a significantly higher risk for the development of postoperative SIRS after multivariable adjustment [aOR = 1.27; 95% CI (1.21, 1.33)]. The interaction effect of preoperative CAR ≥ 0.278 and SIRS was stronger among patients with the following characteristics: aged ≥ 75 years, male, comorbid with diabetes mellitus and admitted to ICU after surgery, duration of surgery < 120 minutes, underwent cerebral surgery or skin, spine and joint surgery (all P < 0.001). The above results remained robust in the sensitivity analysis. Preoperative CAR ≥ 0.278 was significantly associated with increased risk of postoperative SIRS in elderly patients. Special attention should be paid to elderly patients with a preoperative CAR ≥ 0.278 so as to reduce the incidence of postoperative SIRS.

2022-03-14 — A Novel Empirical and Deep Ensemble Super Learning Approach in Predicting Reservoir Wettability via Well Logs

Authors: Daniel Asante Otchere, Mohammed Abdalla Ayoub Mohammed, T. Ganat, R. Gholami, Z. M. Aljunid Merican
Year: 2022
Publication Date: 2022-03-14
Venue: Applied Sciences
DOI: 10.3390/app12062942
Link: Semantic Scholar
Matched Keywords: super learner, super learning

Abstract:
Accurately measuring wettability is of the utmost importance because it influences several reservoir parameters while also impacting reservoir potential, recovery, development, and management plan. As such, this study proposes a new formulated mathematical model based on the correlation between the Amott-USBM wettability measurement and field NMR T2LM log. The exponential relationship based on the existence of immiscible fluids in the pore space had a correlation coefficient of 0.95. Earlier studies on laboratory core wettability measurements using T2 distribution as a function of increasing water saturation were modified to include T2LM field data. Based on the trends observed, water-wet and oil-wet conditions were qualitatively identified. Using the mean T2LM for the intervals of interest and the formulated mathematical formula, the various wetting conditions in existence were quantitatively measured. Results of this agreed with the various core wettability measurements used to develop the mathematical equation. The results expressed the validity of the mathematical equation to characterise wettability at the field scale. With the cost of running NMR logs not favourable, and hence not always run, a deep ensemble super learner was employed to establish a relationship between NMR T2LM and wireline logs. This model is based on the architecture of a deep learning model and the theoretical background of ensemble models due to their reported superiority. The super learner was developed using nine ensemble models as base learners. The performance of nine ensemble models was compared to the deep ensemble super learner. Based on the RMSE, R2, MAE, MAPD and MPD the deep ensemble super learner greatly outperformed the base learners. This indicates that the deep ensemble super learner can be used to predict NMR T2LM in the field. By applying the methodology and mathematical formula proposed in this study, the wettability of reservoirs can be accurately characterised as illustrated in the field deployment.

2022-03-09 — SuperCone: Unified User Segmentation over Heterogeneous Experts via Concept Meta-learning

Authors: Keqian Li, Yifan Hu
Year: 2022
Publication Date: 2022-03-09
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
We study the problem of user segmentation: given a set of users and one or more predefined groups or segments, assign users to their corresponding segments. As an example, for a segment indicating particular interest in a certain area of sports or entertainment, the task will be to predict whether each single user will belong to the segment. However, there may exist numerous long tail prediction tasks that suffer from data availability and may be of heterogeneous nature, which make it hard to capture using single off the shelf model architectures. In this work, we present SuperCone, our unified predicative segments system that addresses the above challenges. It builds on top of a flat concept representation that summarizes each user's heterogeneous digital footprints, and uniformly models each of the prediction task using an approach called"super learning", that is, combining prediction models with diverse architectures or learning method that are not compatible with each other. Following this, we provide an end to end approach that learns to flexibly attend to best suited heterogeneous experts adaptively, while at the same time incorporating deep representations of the input concepts that augments the above experts. Experiments show that SuperCone significantly outperform state-of-the-art recommendation and ranking algorithms on a wide range of predicative segment tasks and public structured data learning benchmarks.

2022-03-01 — The Influence of Concomitant Hammertoe Correction on Postoperative Outcomes in Patients Undergoing Hallux Valgus Correction

Authors: L. Rajan, R. Fuller, Jiaqi Zhu, Tonya W. An, S. Ellis
Year: 2022
Publication Date: 2022-03-01
Venue: Foot and Ankle Surgery
DOI: 10.1177/2473011421S00543
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Category: Bunion; Lesser Toes Introduction/Purpose: Patients with hallux valgus often develop secondary hammertoe deformities of the lesser toes. Chronic lateral deviation of the hallux leads to proximal interphalangeal joint flexion contracture, metatarsophalangeal joint subluxation and ultimately a cross-toe deformity. Operative management of bunions with hammertoe is more extensive since both the primary bunion deformity and the secondary defect have to be corrected simultaneously; however, it is unclear whether simultaneous bunion and hammertoe correction affects patient outcomes. The objective of this study was to compare postoperative patient reported outcomes using the patient-reported outcome measure information system (PROMIS) scores and radiographic outcomes between patients who underwent isolated bunion deformity correction and patients who underwent operative repair of hallux valgus with concomitant hammertoe correction. Methods: This retrospective cohort study included patients over the age of 18 who were treated operatively by 1 of 7 fellowship-trained foot and ankle surgeons for hallux valgus. Those with clinically symptomatic hammertoes were also corrected at the surgeon's discretion. All patients had minimum 1-year postoperative PROMIS scores and minimum 3-month postoperative radiographs. Preoperative, final postoperative and change in PROMIS scores from 6 domains (physical function, pain interference, pain intensity, global physical and mental health, and depression) were compared between the isolated bunion and bunion with hammertoe correction groups. Radiographic measurements compared between cohorts included hallux valgus angle (HVA), intermetatarsal angle (IMA), distal metatarsal-articular angle (DMAA), and Meary's angle. Radiographic parameters were measured on anteroposterior (AP) and lateral weightbearing radiographs. Statistical analysis utilized targeted minimum-loss estimation (TMLE) to control for confounders (age, gender, BMI). Results: A total of 221 patients (134 with isolated bunion correction, 87 with concomitant hammertoe correction) with an average of 19.2 months follow-up were included in this study. Demographically, patients in the concomitant hammertoe cohort were older than the isolated bunion group (58.5 vs 53.1, p<0.01) and had a higher BMI (24.8 vs 58.5, p<0.05). Both cohorts demonstrated improvement in all PROMIS domains except for global mental health and depression. The isolated bunion cohort had significantly better improvements in pain interference and pain intensity when compared to the concomitant hammertoe group (p<0.01, p<0.05 respectively) (Table 1). The isolated bunion cohort had lower postoperative pain interference scores (p<.01). Radiographically, the concomitant hammertoe group had a higher preoperative HVA than the isolated bunion group (30 vs 27.6, p<.05). There were no statistically significant differences between the two cohorts in postoperative radiographic parameters, in addition to the probability of achieving normal postoperative radiographic measures. Conclusion: Patients undergoing simultaneous bunion and hammertoe correction experienced worse postoperative outcomes measured by PROMIS pain interference, had significantly less improvement in PROMIS pain interference and pain intensity scores, and had more severe preoperative radiographic deformity when compared to those who underwent isolated bunion correction. The relationship between hallux valgus and hammertoe development should be considered when counseling patients for surgery. Patients with hallux valgus who show early signs of hammertoe formation such as second toe elevation or pain under the second metatarsal head may benefit from earlier bunion correction before the hammertoe progresses to the point of needing surgical management.

2022-03-01 — Deep Ensemble Machine Learning Framework for the Estimation of PM2.5 Concentrations

Authors: Wenhua Yu, Shanshan Li, Tingting Ye, R. Xu, Jiangning Song, Yuming Guo
Year: 2022
Publication Date: 2022-03-01
Venue: Environmental Health Perspectives
DOI: 10.1289/EHP9752
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Background: Accurate estimation of historical PM2.5 (particle matter with an aerodynamic diameter of less than 2.5μm) is critical and essential for environmental health risk assessment. Objectives: The aim of this study was to develop a multiple-level stacked ensemble machine learning framework for improving the estimation of the daily ground-level PM2.5 concentrations. Methods: An innovative deep ensemble machine learning framework (DEML) was developed to estimate the daily PM2.5 concentrations. The framework has a three-stage structure: At the first stage, four base models [gradient boosting machine (GBM), support vector machine (SVM), random forest (RF), and eXtreme gradient boosting (XGBoost)] were used to generate a new data set of PM2.5 concentrations for training the next-stage learners. At the second stage, three meta-models [RF, XGBoost, and Generalized Linear Model (GLM)] were used to estimate PM2.5 concentrations using a combination of the original data set and the predictions from the first-stage models. At the third stage, a nonnegative least squares (NNLS) algorithm was employed to obtain the optimal weights for PM2.5 estimation. We took the data from 133 monitoring stations in Italy as an example to implement the DEML to predict daily PM2.5 at each 1km×1km grid cell from 2015 to 2019 across Italy. We evaluated the model performance by performing 10-fold cross-validation (CV) and compared it with five benchmark algorithms [GBM, SVM, RF, XGBoost, and Super Learner (SL)]. Results: The results revealed that the PM2.5 prediction performance of DEML [coefficients of determination (R2)=0.87 and root mean square error (RMSE)=5.38μg/m3] was superior to any benchmark models (with R2 of 0.51, 0.76, 0.83, 0.70, and 0.83 for GBM, SVM, RF, XGBoost, and SL approach, respectively). DEML displayed reliable performance in capturing the spatiotemporal variations of PM2.5 in Italy. Discussion: The proposed DEML framework achieved an outstanding performance in PM2.5 estimation, which could be used as a tool for more accurate environmental exposure assessment. https://doi.org/10.1289/EHP9752

2022-03-01 — Abstract 060: Association Of Antecedent Statin Use With Outcomes Of People With Covid-19 Admitted At Northwestern Medicine Health System

Authors: A. Rivera, Omar Al-Heeti, Lucia Petito, Janna L. Williams, Matthew J. Feinstein, B. Taiwo, C. Achenbach
Year: 2022
Publication Date: 2022-03-01
Venue: Circulation
DOI: 10.1161/circ.145.suppl_1.060
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Background: Several observational studies have found that antecedent statin use (i.e., use prior to getting admitted) was associated with lower mortality risk in hospitalized COVID-19 patients, but this is not a consistent finding. Differences maybe due to covariate imbalance, model misspecification, or selection bias. Objective: Estimate the association of antecedent statin use with adverse outcomes (in-hospital death, intubation, ICU admission) in patients admitted for COVID-19 in an academic health system in Chicago. Methods: We analyzed electronic health records from an academic health system in Chicago (Mar ‘20-Mar ‘21) comparing rates of adverse events (composite and per outcome) between antecedent users and non-users. Eligible individuals were ≥40 years old in Illinois, admitted for ≥24 hours, and tested positive for COVID-19 in the 30 days before to 7 days after admission. Antecedent use is defined as existence of statins prescription ≥30 days before admission. We used augmented inverse probability weighting (AIPW) with targeted maximum likelihood estimation to improve covariate balance and estimate the risk difference. Compared to standard methods, this approach allowed use of machine learning models and is doubly robust to misspecification. Results: Of 6267 admitted, 1337 (20%) were antecedent users. Users tend to be older, male, White, smoke, and have a comorbidity. Unadjusted analysis showed significantly higher rates of negative outcomes in non-users except in-hospital death. Analysis using AIPW improved covariate balance and showed that users had significantly lower rates of the composite outcome (RD: -3.9%, 95%CI: -6.0, -1.9) and ICU admissions (RD: -4.0, 95%CI: -7.0, -1.0). No differences in intubation and mortality rates were detected. Conclusion: Antecedent statin use is associated with lower risk of ICU admissions but not with intubation or in-hospital mortality. We were not able to confirm the mortality benefit detected by prior studies nor any differences in rates of intubations.

2022-02-23 — Super Learner for Malicious URL Detection

Authors: Asela Hevapathige, Kavindu Rathnayake
Year: 2022
Publication Date: 2022-02-23
Venue: 2022 2nd International Conference on Advanced Research in Computing (ICARC)
DOI: 10.1109/ICARC54489.2022.9753802
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Malicious Uniform Resource Locator (URL) detection is one of the prominent research areas in Cyber security. Machine learning and statistical models are mainly used for this task due to their ability to adapt complex patterns. This research study mainly focused on implementing a machine learning classifier model using Super Learner ensemble to classify malicious URLs. Static feature set is extracted using only the URL information with less latency and reduced computational complexity to support offline and real-time detection. Proposed binary classifier model is used to separate malicious URLs from benign ones whereas the proposed multi-class classifier model separates URLs into benign and multiple categories of attacks (phishing, malware, spam and defacement). These classifiers are tested on a dataset comprising around 750,000 URLs. The empirical results show that the proposed model works well in malicious URL detection. The binary classifier provides 95.145% accuracy and 96.844% precision whereas the multi-class classifier provides 94.69% accuracy and 96.234% precision. Also, the comparison results show that the proposed model outperforms leading supervised machine learning algorithms in malicious URL detection.

2022-02-22 — Artificial intelligence‐based super learner approach for prediction and optimization of biodiesel synthesis—A case of waste utilization

Authors: S. Zakir Hossain, N. Sultana, Muhammad Faisal Irfan, S. Manirul Haque, Nawaf Nasr, S. Razzak
Year: 2022
Publication Date: 2022-02-22
Venue: International Journal of Energy Research
DOI: 10.1002/er.7764
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
In this article, super learner approaches such as hybrid Bayesian Optimization Algorithm‐Support Vector Regression (BOA‐SVR), Bayesian Optimization Algorithm‐Boosted Regression Tree (BOA‐BRT), along with a statistical method (response surface methodology, RSM) were utilized as potential tools for predicting biodiesel synthesis using waste date seed oil as feedstock. Novelties of this investigation comprise (a) hybridization of BOA with each artificial intelligence (AI) approach resulting in the formation of BOA‐SVR and BOA‐BRT super learner models, (b) the model performance was compared using several performance indicators including coefficient of determination (R2), mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), etc., (c) validation of the model was confirmed using extra simulated data, (d) the crow search algorithm (CSA) was integrated with the BOA‐SVR resulting advanced super learner model (BOA‐SVR‐CSA) for finding the global optimal point. BOA‐BRT model provided relatively low R2 (0.81) and high errors (MAE of 8.5159, RSME of 12.4674, MAPE of 106.0391). RSM model was statistically significant (P‐value <.05) with relatively high R2 (0.95) and moderate errors (MAE of 4.8886, RSME of 5.5964, MAPE of 22.1574). The BOA‐SVR model provided low errors (MAE of 3.8342, RSME of 3.8884, MAPE of 18.91) with a high R2 of 0.98. The overall results suggested that the BOA‐SVR model performs better with increased accuracy than other models. The extra simulated data further confirmed the prediction capability of the developed super learner model (BOA‐SVR). The maximum biodiesel yield of 91.35% was achieved with a KOH dose of 0.6 wt%, M:O of 7:1 at a reaction time of 2 hours using the advanced super learner model (BOA‐SVR‐CSA). Overall, this novel platform could be of considerable promise in other process modeling and multiobjective optimization applications.

2022-02-20 — Register variation in spoken and written language use across technology-mediated and non-technology-mediated learning environments

Authors: K. Kyle, Masaki Eguchi, Ann Tai Choe, Geoffrey T. LaFlair
Year: 2022
Publication Date: 2022-02-20
Venue: Language Testing
DOI: 10.1177/02655322211057868
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
In the realm of language proficiency assessments, the domain description inference and the extrapolation inference are key components of a validity argument. Biber et al.’s description of the lexicogrammatical features of the spoken and written registers in the T2K-SWAL corpus has served as support for the TOEFL iBT test’s domain description and extrapolation inferences. In the time since the T2K-SWAL corpus was collected, however, university learning environments have increasingly become technology-mediated. Accordingly, any description of the linguistic features of university language should account for the language produced in technology-mediated learning environments (TMLEs) in addition to non-technology-mediated learning environments (non-TMLEs). Kyle et al. recently began to address this issue by collecting a corpus of TMLE language use, which they then compared to language use in non-TMLEs using multidimensional analysis (MDA). The results indicated both similarities and substantive differences across the learning environments, but the study did not investigate the effects of particular registers on these results. In this study, we build on previous research by investigating lexicogrammatical features of specific spoken and written registers across technology-mediated and non-technology-mediated learning environments.

2022-02-16 — Machine Learning for Outcome Prediction in First-Line Surgery of Prolactinomas

Authors: M. Huber, M. Luedi, G. Schubert, C. Musahl, A. Tortora, J. Frey, J. Beck, L. Mariani, E. Christ, L. Andereggen
Year: 2022
Publication Date: 2022-02-16
Venue: Frontiers in Endocrinology
DOI: 10.3389/fendo.2022.810219
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Background First-line surgery for prolactinomas has gained increasing acceptance, but the indication still remains controversial. Thus, accurate prediction of unfavorable outcomes after upfront surgery in prolactinoma patients is critical for the triage of therapy and for interdisciplinary decision-making. Objective To evaluate whether contemporary machine learning (ML) methods can facilitate this crucial prediction task in a large cohort of prolactinoma patients with first-line surgery, we investigated the performance of various classes of supervised classification algorithms. The primary endpoint was ML-applied risk prediction of long-term dopamine agonist (DA) dependency. The secondary outcome was the prediction of the early and long-term control of hyperprolactinemia. Methods By jointly examining two independent performance metrics – the area under the receiver operating characteristic (AUROC) and the Matthews correlation coefficient (MCC) – in combination with a stacked super learner, we present a novel perspective on how to assess and compare the discrimination capacity of a set of binary classifiers. Results We demonstrate that for upfront surgery in prolactinoma patients there are not a one-algorithm-fits-all solution in outcome prediction: different algorithms perform best for different time points and different outcomes parameters. In addition, ML classifiers outperform logistic regression in both performance metrics in our cohort when predicting the primary outcome at long-term follow-up and secondary outcome at early follow-up, thus provide an added benefit in risk prediction modeling. In such a setting, the stacking framework of combining the predictions of individual base learners in a so-called super learner offers great potential: the super learner exhibits very good prediction skill for the primary outcome (AUROC: mean 0.9, 95% CI: 0.92 – 1.00; MCC: 0.85, 95% CI: 0.60 – 1.00). In contrast, predicting control of hyperprolactinemia is challenging, in particular in terms of early follow-up (AUROC: 0.69, 95% CI: 0.50 – 0.83) vs. long-term follow-up (AUROC: 0.80, 95% CI: 0.58 – 0.97). It is of clinical importance that baseline prolactin levels are by far the most important outcome predictor at early follow-up, whereas remissions at 30 days dominate the ML prediction skill for DA-dependency over the long-term. Conclusions This study highlights the performance benefits of combining a diverse set of classification algorithms to predict the outcome of first-line surgery in prolactinoma patients. We demonstrate the added benefit of considering two performance metrics jointly to assess the discrimination capacity of a diverse set of classifiers.

2022-02-08 — Comparison Between Preoperative Methadone and Buprenorphine Use on Postoperative Opioid Requirement

Authors: R. Komatsu, Michael G Nash, K. Peperzak, Taylor M Ziga, E. Dinges, C. Delgado, Jiang Wu, G. Terman, R. Dale
Year: 2022
Publication Date: 2022-02-08
Venue: The Clinical Journal of Pain
DOI: 10.1097/AJP.0000000000001019
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Objectives: Buprenorphine is a partial agonist at mu-opioid receptors and competes for these receptors with other opioids in vitro. Whether patients on buprenorphine maintenance require high doses of opioid analgesics to attain adequate postoperative pain control has not been determined. We evaluated differences in acute postoperative opioid consumption and pain burden between patients taking buprenorphine and those taking methadone preoperatively. Materials and Methods: A retrospective review of medical records of 928 patients, of whom 195 were on buprenorphine and 733 were on methadone preoperatively, was performed. Among methadone and buprenorphine patients, 615 and 89, respectively, continued to receive the medications postoperatively. Buprenorphine patients were compared with methadone patients for the first 48 hours postoperatively with regard to acute opioid dose requirements (morphine milligram equivalents [MME] above their baseline buprenorphine and methadone doses) and time-weighted average (TWA) pain scores (using targeted maximum likelihood estimation). Results: Opioid dose requirements for 48 hours postoperatively were 150 (22 to 297) (median [interquartile range]) and 220 [90 to 360] MME for buprenorphine and methadone patients, respectively. Preoperative buprenorphine was associated with a 59.9% lower postoperative MME (95% confidence interval: 46.6%-69.8%, P<0.0001) compared with methadone. Postoperative TWA pain scores for the first 48 hours were 5.0±2.7 (mean±SD), and 5.4±2.3 for buprenorphine and methadone patients, respectively. Preoperative buprenorphine was associated with a 0.37-point lower TWA pain score (95% confidence interval: 0.14-0.61, P=0.002) compared with methadone. Discussion: Preoperative buprenorphine use was associated with >50% reduction in postoperative opioid dose requirement and a statistically significant, though clinically unimportant, reduction in acute pain burden in comparison to methadone. The study is limited by several important factors such as the exclusion of patients requiring intravenous patient-controlled analgesia, small number of patients were on higher dose of buprenorphine, and a large percentage of methadone patients were not on a stable dose of methadone yet.

2022-01-31 — Occupations on the map: Using a super learner algorithm to downscale labor statistics

Authors: Michiel van Dijk, Thijs de Lange, P. van Leeuwen, P. Debie
Year: 2022
Publication Date: 2022-01-31
Venue: PLoS ONE
DOI: 10.1371/journal.pone.0278120
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Detailed and accurate labor statistics are fundamental to support social policies that aim to improve the match between labor supply and demand, and support the creation of jobs. Despite overwhelming evidence that labor activities are distributed unevenly across space, detailed statistics on the geographical distribution of labor and work are not readily available. To fill this gap, we demonstrated an approach to create fine-scale gridded occupation maps by means of downscaling district-level labor statistics, informed by remote sensing and other spatial information. We applied a super-learner algorithm that combined the results of different machine learning models to predict the shares of six major occupation categories and the labor force participation rate at a resolution of 30 arc seconds (~1x1 km) in Vietnam. The results were subsequently combined with gridded information on the working-age population to produce maps of the number of workers per occupation. The super learners outperformed (n = 6) or had similar (n = 1) accuracy in comparison to best-performing single machine learning algorithms. A comparison with an independent high-resolution wealth index showed that the shares of the four low-skilled occupation categories (91% of the labor force), were able to explain between 28% and 43% of the spatial variation in wealth in Vietnam, pointing at a strong spatial relationship between work, income and wealth. The proposed approach can also be applied to produce maps of other (labor) statistics, which are only available at aggregated levels.

2022-01-27 — Data-Driven Compressive Strength Prediction of Fly Ash Concrete Using Ensemble Learner Algorithms

Authors: M. S. Barkhordari, D. J. Armaghani, A. Mohammed, D. Ulrikh
Year: 2022
Publication Date: 2022-01-27
Venue: Buildings
DOI: 10.3390/buildings12020132
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Concrete is one of the most popular materials for building all types of structures, and it has a wide range of applications in the construction industry. Cement production and use have a significant environmental impact due to the emission of different gases. The use of fly ash concrete (FAC) is crucial in eliminating this defect. However, varied features of cementitious composites exist, and understanding their mechanical characteristics is critical for safety. On the other hand, for forecasting the mechanical characteristics of concrete, machine learning approaches are extensively employed algorithms. The goal of this work is to compare ensemble deep neural network models, i.e., the super learner algorithm, simple averaging, weighted averaging, integrated stacking, as well as separate stacking ensemble models, and super learner models, in order to develop an accurate approach for estimating the compressive strength of FAC and reducing the high variance of the predictive models. Separate stacking with the random forest meta-learner received the most accurate predictions (97.6%) with the highest coefficient of determination and the lowest mean square error and variance.

2022-01-27 — Anemia and adverse outcomes in pregnancy: subgroup analysis of the CLIP cluster-randomized trial in India

Authors: J. Bone, M. Bellad, S. Goudar, Ashalata A. Mallapur, U. Charantimath, U. Ramadurg, G. Katageri, Maria Lesperance, M. Woo Kinshella, Raiya Suleman, M. Vidler, Sumedha Sharma, R. Derman, L. Magee, P. von Dadelszen, Shashidhar G. Keval S. Vaibhav B. Anjali M. Namdev A. Gudaday Bannale Chougala Dhamanekar Joshi Kamble Kengapur, Shashidhar G. Bannale, Keval S. Chougala, Vaibhav B. Dhamanekar, Anjali M. Joshi, Namdev A. Kamble, Gudadayya S. Kengapur, Uday S. Kudachi, Sphoorthi S. Mastiholi, Geetanjali I Mungarwadi, Esperança Sevene, K. Munguambe, C. Sacoor, E. Macete, H. Boene, Felizarda Amose, O. Augusto, C. Bique, Ana Ilda Biz, Rogério Chiaú, Silvestre Cutana, Paulo Filimone, Emília Gonçalves, Marta Macamo, Salésio Macuácua, S. Maculuve, Ernesto Mandlate, Analisa Matavele, S. Mocumbi, Dulce Mulungo, Zefanias Nhamirre, Ariel Nhancolo, Cláudio Nkumbula, Vivalde Nobela, Rosa Pires, Corsino Tchavana, Anifa Valá, F. Vilanculo, R. Qureshi, Sana Sheikh, Z. Hoodbhoy, I. Ahmed, Amjad Hussain, J. Memon, Farrukh Raza, O. Adetoro, J. Sotunsa, S. Drebit, C. Kariya, Mansun Lui, D. Sawchuck, U. Ukah, M. Woo Kinshella, S. Dharamsi, G. Dumont, T. Firoz, A. Betrán, S. Engelbrecht, V. Filippi, W. Grobman, M. Knight, A. Langer, S. Lewin, G. Lewis, C. Mitton, N. Schuurman, James G Thornton, F. Donnay, R. Byaruhanga, B. Darlow, E. Hutton, M. Merialdi, L. Thabane, K. Pickerill, A. Kavi, Chandrashekhar C Karadiguddi, Sangamesh Rakaraddi, Amit Revankar
Year: 2022
Publication Date: 2022-01-27
Venue: BMC Pregnancy and Childbirth
DOI: 10.1186/s12884-022-04714-y
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Background Iron-deficiency anemia is a known risk factor for several adverse perinatal outcomes, but data on its impact on specific maternal morbidities is less robust. Further, information on associations between anemia in early pregnancy and subsequent outcomes are understudied. Methods The study population was derived from the Community Level Interventions for Pre-eclampsia (CLIP) trial in Karnataka State, India (NCT01911494). Included were women who were enrolled in either trial arm, delivered by trial end date, and had a baseline measure of hemoglobin (Hb). Anemia was classified by WHO standards into four groups: none (Hb ≥ 11 g/dL), mild (10.0 g/dL ≤ Hb < 11.0 g/dL), moderate (7.0 g/dL ≤ Hb < 10.0 g/dL) and severe (Hb < 7.0 g/dL). Targeted maximum likelihood estimation was used to estimate confounder-adjusted associations between anemia and a composite (and its components) of adverse maternal outcomes, including pregnancy hypertension. E-values were calculated to assess robustness to unmeasured confounding. Results Of 11,370 women included, 10,066 (88.5%) had anemia, that was mild (3690, 32.5%), moderate (6023, 53.0%), or severe (68, 0.6%). Almost all women (> 99%) reported taking iron supplements during pregnancy. Blood transfusions was more often administered to those with anemia that was mild (risk ratio [RR] 2.16, 95% confidence interval [CI] 1.31–3.56), moderate (RR 2.37, 95% CI 1.56–3.59), and severe (RR 5.70, 95% CI 3.00–10.85). No significant association was evident between anemia severity and haemorrhage (antepartum or postpartum) or sepsis, but there was a U-shaped association between anemia severity and pregnancy hypertension and pre-eclampsia specifically, with the lowest risk seen among those with mild or moderate anemia. Conclusion In Karnataka State, India, current management strategies for mild-moderate anemia in early pregnancy are associated with similar rates of adverse maternal or perinatal outcomes, and a lower risk of pregnancy hypertension and preeclampsia, compared with no anemia in early pregnancy. Future research should focus on risk mitigation for women with severe anemia, and the potential effect of iron supplementation for women with normal Hb in early pregnancy.

2022-01-13 — Accounting for motion in resting-state fMRI: What part of the spectrum are we characterizing in autism spectrum disorder?

Authors: M. B. Nebel, D. Lidstone, Liwei Wang, David C. Benkeser, S. Mostofsky, Benjamin B. Risk
Year: 2022
Publication Date: 2022-01-13
Venue: bioRxiv
DOI: 10.1016/j.neuroimage.2022.119296
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
The exclusion of high-motion participants can reduce the impact of motion in functional Magnetic Resonance Imaging (fMRI) data. However, the exclusion of high-motion participants may change the distribution of clinically relevant variables in the study sample, and the resulting sample may not be representative of the population. Our goals are two-fold: 1) to document the biases introduced by common motion exclusion practices in functional connectivity research and 2) to introduce a framework to address these biases by treating excluded scans as a missing data problem. We use a study of autism spectrum disorder in children without an intellectual disability to illustrate the problem and the potential solution. We aggregated data from 545 children (8-13 years old) who participated in resting-state fMRI studies at Kennedy Krieger Institute (173 autistic and 372 typically developing) between 2007 and 2020. We found that autistic children were more likely to be excluded than typically developing children, with 28.5% and 16.1% of autistic and typically developing children excluded, respectively, using a lenient criterion and 81.0% and 60.1% with a stricter criterion. The resulting sample of autistic children with usable data tended to be older, have milder social deficits, better motor control, and higher intellectual ability than the original sample. These measures were also related to functional connectivity strength among children with usable data. This suggests that the generalizability of previous studies reporting naïve analyses (i.e., based only on participants with usable data) may be limited by the selection of older children with less severe clinical profiles because these children are better able to remain still during an rs-fMRI scan. We adapt doubly robust targeted minimum loss based estimation with an ensemble of machine learning algorithms to address these data losses and the resulting biases. The proposed approach selects more edges that differ in functional connectivity between autistic and typically developing children than the naïve approach, supporting this as a promising solution to improve the study of heterogeneous populations in which motion is common.

2022-01-07 — Causal inference in case of near‐violation of positivity: comparison of methods

Authors: M. Léger, A. Chatton, F. Le Borgne, R. Pirracchio, S. Lasocki, Y. Foucher
Year: 2022
Publication Date: 2022-01-07
Venue: Biometrical journal. Biometrische Zeitschrift
DOI: 10.1002/bimj.202000323
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
In causal studies, the near‐violation of the positivity may occur by chance, because of sample‐to‐sample fluctuation despite the theoretical veracity of the positivity assumption in the population. It may mostly happen when the exposure prevalence is low or when the sample size is small. We aimed to compare the robustness of g‐computation (GC), inverse probability weighting (IPW), truncated IPW, targeted maximum likelihood estimation (TMLE), and truncated TMLE in this situation, using simulations and one real application. We also tested different extrapolation situations for the sub‐group with a positivity violation. The results illustrated that the near‐violation of the positivity impacted all methods. We demonstrated the robustness of GC and TMLE‐based methods. Truncation helped in limiting the bias in near‐violation situations, but at the cost of bias in normal conditions. The application illustrated the variability of the results between the methods and the importance of choosing the most appropriate one. In conclusion, compared to propensity score‐based methods, methods based on outcome regression should be preferred when suspecting near‐violation of the positivity assumption.

2022-01-06 — Low hepatitis C virus-viremia prevalence yet continued barriers to direct-acting antiviral treatment in people living with HIV in the Netherlands

Authors: C. Isfordink, C. Smit, A. Boyd, M. de Regt, B. Rijnders, R. van Crevel, R. Ackens, P. Reiss, J. Arends, M. van der Valk
Year: 2022
Publication Date: 2022-01-06
Venue: AIDS (London)
DOI: 10.1097/QAD.0000000000003159
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Objective: To describe hepatitis C virus (HCV)-viremia prevalence and barriers to direct-acting antiviral (DAA) treatment during unrestricted access to DAA in a nationwide cohort of people with HIV (PWH). Design: Retrospective analysis of prospectively collected data. Methods: We calculated yearly HCV-viremia prevalence as proportion of HCV RNA-positive individuals ever HCV-tested. We then included HCV-viremic individuals with ≥1 visit during the era of universal DAA-access (database lock = December 31, 2018). Based on their last visit, individuals were grouped as DAA-treated or -untreated. Variables associated with lack of DAA-treatment were assessed using targeted maximum likelihood estimation. In November 2020, physicians of DAA-untreated individuals completed a questionnaire on barriers to DAA-uptake and onward HCV-transmission risk. Results: Among 25 196 PWH, HCV-viremia decreased from 4% to 5% between 2000 and 2014 to 0.6% in 2019. Being DAA-untreated was associated with HIV-transmission route other than men who have sex with men, older age, infrequent follow-up, severe alcohol use, detectable HIV-RNA, HCV-genotype 3, and larger hospital size. With universal DAA-access, 72 of 979 HCV-viremic individuals remained DAA-untreated at their last visit. Of these, 39 were no longer in care, 27 remained DAA-untreated in care, and six initiated DAA since database lock. Most common physician-reported barriers to DAA-uptake were patient refusal (20/72, 28%) and infrequent visit attendance (19/72, 26%). Only one DAA-untreated individual in care was engaging in activities associated with onward HCV-transmission. Conclusions: Prevalence of HCV-viremic PWH is low in the Netherlands, coinciding with widespread DAA-uptake. Barriers to DAA-uptake appear mostly patient-related, while HCV-transmission seems unlikely from the few DAA-untreated in care.

2022-01-05 — A potential outcomes approach to defining and estimating gestational age-specific exposure effects during pregnancy

Authors: M. Schnitzer, Steve Ferreira Guerra, C. Longo, L. Blais, R. Platt
Year: 2022
Publication Date: 2022-01-05
Venue: Statistical Methods in Medical Research
DOI: 10.1177/09622802211065158
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Many studies seek to evaluate the effects of potentially harmful pregnancy exposures during specific gestational periods. We consider an observational pregnancy cohort where pregnant individuals can initiate medication usage or become exposed to a drug at various times during their pregnancy. An important statistical challenge involves how to define and estimate exposure effects when pregnancy loss or delivery can occur over time. Without proper consideration, the results of standard analysis may be vulnerable to selection bias, immortal time-bias, and time-dependent confounding. In this study, we apply the “target trials” framework of Hernán and Robins in order to define effects based on the counterfactual approach often used in causal inference. This effect is defined relative to a hypothetical randomized trial of timed pregnancy exposures where delivery may precede and thus potentially interrupt exposure initiation. We describe specific implementations of inverse probability weighting, G-computation, and Targeted Maximum Likelihood Estimation to estimate the effects of interest. We demonstrate the performance of all estimators using simulated data and show that a standard implementation of inverse probability weighting is biased. We then apply our proposed methods to a pharmacoepidemiology study to evaluate the potentially time-dependent effect of exposure to inhaled corticosteroids on birthweight in pregnant people with mild asthma.

2022-01-04 — Advanced utilization of multi-learning algorithm: ensemble super learner to map groundwater potential for potable mineral water

Authors: Sanghoon Lee, D. Kaown, E. Koh, Hye-Lim Lee, K. Ko, K. Lee
Year: 2022
Publication Date: 2022-01-04
Venue: Geocarto International
DOI: 10.1080/10106049.2022.2025921
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract Although mapping the groundwater quality is crucial for people who require groundwater with strict quality standards, the ability to take intensive measurements has been restricted by a lack of groundwater accessibility. Thus, this study aimed to estimate and map the suitability of groundwater quality for use as potable mineral water. We attempted a novel approach by targeting comprehensive qualities for a specific groundwater use and by adopting a super learner that combines multiple different learning algorithms. The super learner generated a groundwater potential map indicating a zone with a high potential for mineral water and it outperformed the base learners by 21%–74%. Estimation results designated appropriate groundwater development locations for mineral water use, and assessment of predictors determined favorable environments. Consequently, the proposed approach presented a possible method for finding groundwater with the required quality for its optimal usage. Furthermore, it provided the possibility of worldwide application.

2022-01-04 — A comparison of covariate adjustment approaches under model misspecification in individually randomized trials

Authors: Mia S. Tackney, T. Morris, Ian R. White, C. Leyrat, K. Diaz-Ordaz, Elizabeth A. Williamson
Year: 2022
Publication Date: 2022-01-04
Venue: Trials
DOI: 10.1186/s13063-022-06967-6
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Adjustment for baseline covariates in randomized trials has been shown to lead to gains in power and can protect against chance imbalances in covariates. For continuous covariates, there is a risk that the the form of the relationship between the covariate and outcome is misspecified when taking an adjusted approach. Using a simulation study focusing on individually randomized trials with small sample sizes, we explore whether a range of adjustment methods are robust to misspecification, either in the covariate–outcome relationship or through an omitted covariate–treatment interaction. Specifically, we aim to identify potential settings where G-computation, inverse probability of treatment weighting (IPTW), augmented inverse probability of treatment weighting (AIPTW) and targeted maximum likelihood estimation (TMLE) offer improvement over the commonly used analysis of covariance (ANCOVA). Our simulations show that all adjustment methods are generally robust to model misspecification if adjusting for a few covariates, sample size is 100 or larger, and there are no covariate–treatment interactions. When there is a non-linear interaction of treatment with a skewed covariate and sample size is small, all adjustment methods can suffer from bias; however, methods that allow for interactions (such as G-computation with interaction and IPTW) show improved results compared to ANCOVA. When there are a high number of covariates to adjust for, ANCOVA retains good properties while other methods suffer from under- or over-coverage. An outstanding issue for G-computation, IPTW and AIPTW in small samples is that standard errors are underestimated; they should be used with caution without the availability of small-sample corrections, development of which is needed. These findings are relevant for covariate adjustment in interim analyses of larger trials.

2022-01-01 — Using machine learning to predict help-seeking among 2016–2018 Pregnancy Risk Assessment Monitoring System participants with postpartum depression symptoms

Authors: R. Fischbein, Heather L Cook, K. Baughman, S. Díaz
Year: 2022
Publication Date: 2022-01-01
Venue: Women's Health
DOI: 10.1177/17455057221139664
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Background: Despite the importance of early identification and treatment, postpartum depression often remains largely undiagnosed with unreported symptoms. While research has identified several factors as prompting help-seeking for postpartum depression symptoms, no research has examined help-seeking for postpartum depression using data from a multi-state/jurisdictional survey analyzed with machine learning techniques. Objectives: This study examines help-seeking among people with postpartum depression symptoms using and demonstrating the utility of machine learning techniques. Methods: Data from the 2016–2018 Pregnancy Risk Assessment Monitoring System, a cross-sectional survey matched with birth certificate data, were used. Six US states/jurisdictions included the outcome help-seeking for postpartum depression symptoms and were used in the analysis. An ensemble method, “Super Learner,” was used to identify the best combination of algorithms and most important variables that predict help-seeking among 1920 recently pregnant people who screen positive for postpartum depression symptoms. Results: The Super Learner predicted well and had an area under the receiver operating curve of 87.95%. It outperformed the highest weighted algorithms which were conditional random forest and stochastic gradient boosting. The following variables were consistently among the top 10 most important variables across the algorithms for predicting increased help-seeking: participants who reported having been diagnosed with postpartum depression, having depression during pregnancy, living in particular US states, being a White compared to Black or Asian American individual, and having a higher maternal body mass index at the time of the survey. Conclusion: These results show the utility of using ensemble machine learning techniques to examine complex topics like help-seeking. Healthcare providers should consider the factors identified in this study when screening and conducting outreach and follow-up for postpartum depression symptoms.

2022-01-01 — Identifying HIV sequences that escape antibody neutralization using random forests and collaborative targeted learning

Authors: Yutong Jin, David C. Benkeser
Year: 2022
Publication Date: 2022-01-01
Venue: Journal of Causal Inference
DOI: 10.1515/jci-2021-0053
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Abstract Recent studies have indicated that it is possible to protect individuals from HIV infection using passive infusion of monoclonal antibodies. However, in order for monoclonal antibodies to confer robust protection, the antibodies must be capable of neutralizing many possible strains of the virus. This is particularly challenging in the context of a highly diverse pathogen like HIV. It is therefore of great interest to leverage existing observational data sources to discover antibodies that are able to neutralize HIV viruses via residues where existing antibodies show modest protection. Such information feeds directly into the clinical trial pipeline for monoclonal antibody therapies by providing information on (i) whether and to what extent combinations of antibodies can generate superior protection and (ii) strategies for analyzing past clinical trials to identify in vivo evidence of antibody resistance. These observational data include genetic features of many diverse HIV genetic sequences, as well as in vitro measures of antibody resistance. The statistical learning problem we are interested in is developing statistical methodology that can be used to analyze these data to identify important genetic features that are significantly associated with antibody resistance. This is a challenging problem owing to the high-dimensional and strongly correlated nature of the genetic sequence data. To overcome these challenges, we propose an outcome-adaptive, collaborative targeted minimum loss-based estimation approach using random forests. We demonstrate via simulation that the approach enjoys important statistical benefits over existing approaches in terms of bias, mean squared error, and type I error. We apply the approach to the Compile, Analyze, and Tally Nab Panels database to identify AA positions that are potentially causally related to resistance to neutralization by several different antibodies.

2022 — Blurring cluster randomized trials and observational studies using Two-Stage TMLE to address sub-sampling, missingness, and minimal independent units

Authors: Joshua R Nugent, C. Marquez, E. Charlebois, Rachel Abbott
Year: 2022
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract unavailable from Semantic Scholar.

2022-01-01 — Biodigester Cookstove Interventions and Child Diarrhea in Semirural Nepal: A Causal Analysis of Daily Observations

Authors: Heather K. Amato, Caitlin Hemlock, K. Andrejko, Anna R. Smith, N. Hejazi, A. Hubbard, S. C. Verma, R. Adhikari, Dhiraj Pokhrel, Kirk R. Smith, J. Graham, A. Pokhrel
Year: 2022
Publication Date: 2022-01-01
Venue: Environmental Health Perspectives
DOI: 10.1289/EHP9468
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Background: Hundreds of thousands of biodigesters have been constructed in Nepal. These household-level systems use human and animal waste to produce clean-burning biogas used for cooking, which can reduce household air pollution from woodburning cookstoves and prevent respiratory illnesses. The biodigesters, typically operated by female caregivers, require the handling of animal waste, which may increase domestic fecal contamination, exposure to diarrheal pathogens, and the risk of enteric infections, especially among young children. Objective: We estimated the effect of daily reported biogas cookstove use on incident diarrhea among children <5y old in the Kavrepalanchok District of Nepal. Secondarily, we assessed effect measure modification and statistical interaction of individual- and household-level covariates (child sex, child age, birth order, exclusive breastfeeding, proof of vaccination, roof type, sanitation, drinking water treatment, food insecurity) as well as recent 14-d acute lower respiratory infection (ALRI) and season. Methods: We analyzed 300,133 person-days for 539 children in an observational prospective cohort study to estimate the average effect of biogas stove use on incident diarrhea using cross-validated targeted maximum likelihood estimation (CV-TMLE). Results: Households reported using biogas cookstoves in the past 3 d for 23% of observed person-days. The adjusted relative risk of diarrhea for children exposed to biogas cookstove use was 1.31 (95% confidence interval (CI): 1.00, 1.71) compared to unexposed children. The estimated effect of biogas stove use on diarrhea was stronger among breastfed children (2.09; 95% CI: 1.35, 3.25) than for nonbreastfed children and stronger during the dry season (2.03; 95% CI: 1.17, 3.53) than in the wet season. Among children exposed to biogas cookstove use, those with a recent ALRI had the highest mean risk of diarrhea, estimated at 4.53 events (95% CI: 1.03, 8.04) per 1,000 person-days. Discussion: This analysis provides new evidence that child diarrhea may be an unintended health risk of biogas cookstove use. Additional studies are needed to identify exposure pathways of fecal pathogen contamination associated with biodigesters to improve the safety of these widely distributed public health interventions. https://doi.org/10.1289/EHP9468

2022 — Anticipating the cost of drought events in France by super learning

Authors: Geoffrey Ecoto, A. Chambaz
Year: 2022
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Abstract unavailable from Semantic Scholar.

2022 — A Novel Emoji Based Deep Super Learner (EDSL) for Sentiment Classification

Authors: G. Vashisht, Manisha Jailia, Vishesh Goyal
Year: 2022
Venue: International Conference of Soft Computing and Pattern Recognition
DOI: 10.1007/978-3-030-96302-6_29
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2021 (95 papers)
2021-12-20 — SUPER LEARNER MODEL IN PREDICTION OF HEART ATTACK BASED ON CARDIAC BIOMARKERS

Authors: Anuradha P., Dr. Vasantha Kalyani David
Year: 2021
Publication Date: 2021-12-20
Venue: Indian Journal of Computer Science and Engineering
DOI: 10.21817/indjcse/2021/v12i6/211206076
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Unstable angina and/or a heart attack is caused when restricted flow of blood to the heart occurs due to the narrowed or blocked coronary arteries. On observing Electro cardiogram (ECG), ST segment Elevation Myocardial Infarction (STEMI) can be diagnosed but ECG might not show variation for Non-ST Segment Elevation Myocardial Infarction (NSTEMI). So, cardiac biomarkers could be tested in patients presenting chest pain to confirm whether heart attack or Acute Myocardial Infarction (AMI) is onset or not. Myoglobin, Troponin-I and CK-MB are sensitive biomarkers for diagnosing heart attack/ AMI within specific time frames. In this work, a novel real dataset from a hospital comprising cardiac biomarkers’ values of patients was taken and Machine Learning (ML) classifiers namely Support Vector Machine, Logistic Regression (LR), XGBoost (XGB), CatBoost, Random Forest (RF), Decision Tree Classifier, Gaussian Naïve Bayes (GNB), Majority Vote Ensemble Classifier comprising of LR, XGB, GNB, RF were applied on the dataset. Then a Super Learner was designed by taking a novel combination of these classifiers. The comparison of these classifiers resulted in Super Learner outperforming the other ML classifiers. Subsequently, a graphical user interface prediction tool using the Super Learner model was designed which would guide those who have chest pain due to AMI, to undergo emergency medical care and thereby save lives.

2021-12-15 — A Targeted Approach to Confounder Selection for High-Dimensional Data

Authors: Asad Haris, R. Platt
Year: 2021
Publication Date: 2021-12-15
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
We consider the problem of selecting confounders for adjustment from a potentially large set of covariates, when estimating a causal effect. Recently, the high-dimensional Propensity Score (hdPS) method was developed for this task; hdPS ranks potential confounders by estimating an importance score for each variable and selects the top few variables. However, this ranking procedure is limited: it requires all variables to be binary. We propose an extension of the hdPS to general types of response and confounder variables. We further develop a group importance score, allowing us to rank groups of potential confounders. The main challenge is that our parameter requires either the propensity score or response model; both vulnerable to model misspecification. We propose a targeted maximum likelihood estimator (TMLE) which allows the use of nonparametric, machine learning tools for fitting these intermediate models. We establish asymptotic normality of our estimator, which consequently allows constructing confidence intervals. We complement our work with numerical studies on simulated and real data. Keywords— Causal inference, Confounder selection, High-dimensional data, Targeted maximum likelihood estimation, High-dimensional propensity score

2021-12-10 — Handling missing data when estimating causal effects with targeted maximum likelihood estimation

Authors: S. G. Dashti, Katherine J. Lee, J. Simpson, Ian R. White, J. Carlin, M. Moreno-Betancur, Dds Mph Ghazaleh Dashti
Year: 2021
Publication Date: 2021-12-10
Venue: American Journal of Epidemiology
DOI: 10.1093/aje/kwae012
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Abstract Targeted maximum likelihood estimation (TMLE) is increasingly used for doubly robust causal inference, but how missing data should be handled when using TMLE with data-adaptive approaches is unclear. Based on data (1992-1998) from the Victorian Adolescent Health Cohort Study, we conducted a simulation study to evaluate 8 missing-data methods in this context: complete-case analysis, extended TMLE incorporating an outcome-missingness model, the missing covariate missing indicator method, and 5 multiple imputation (MI) approaches using parametric or machine-learning models. We considered 6 scenarios that varied in terms of exposure/outcome generation models (presence of confounder-confounder interactions) and missingness mechanisms (whether outcome influenced missingness in other variables and presence of interaction/nonlinear terms in missingness models). Complete-case analysis and extended TMLE had small biases when outcome did not influence missingness in other variables. Parametric MI without interactions had large bias when exposure/outcome generation models included interactions. Parametric MI including interactions performed best in bias and variance reduction across all settings, except when missingness models included a nonlinear term. When choosing a method for handling missing data in the context of TMLE, researchers must consider the missingness mechanism and, for MI, compatibility with the analysis method. In many settings, a parametric MI approach that incorporates interactions and nonlinearities is expected to perform well.

2021-12-01 — Validation of Machine Learning-Based Individualized Treatment for Depressive Disorder Using Target Trial Emulation

Authors: Chi-Shin Wu, A. Yang, Shu-Sen Chang, Chia-Ming Chang, Yihao Liu, S. Liao, H. Tsai
Year: 2021
Publication Date: 2021-12-01
Venue: Journal of Personalized Medicine
DOI: 10.3390/jpm11121316
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
This study aims to develop and validate the use of machine learning-based prediction models to select individualized pharmacological treatment for patients with depressive disorder. This study used data from Taiwan’s National Health Insurance Research Database. Patients with incident depressive disorders were included in this study. The study outcome was treatment failure, which was defined as psychiatric hospitalization, self-harm hospitalization, emergency visits, or treatment change. Prediction models based on the Super Learner ensemble were trained separately for the initial and the next-step treatments if the previous treatments failed. An individualized treatment strategy was developed for selecting the drug with the lowest probability of treatment failure for each patient as the model-selected regimen. We emulated clinical trials to estimate the effectiveness of individualized treatments. The area under the curve of the prediction model using Super Learner was 0.627 and 0.751 for the initial treatment and the next-step treatment, respectively. Model-selected regimens were associated with reduced treatment failure rates, with a 0.84-fold (95% confidence interval (CI) 0.82–0.86) decrease for the initial treatment and a 0.82-fold (95% CI 0.80–0.83) decrease for the next-step. In emulation of clinical trials, the model-selected regimen was associated with a reduced treatment failure rate.

2021-12-01 — The Conditional Super Learner

Authors: G. Valdes, Y. Interian, Efstathios D. Gennatas, M. J. Laan
Year: 2021
Publication Date: 2021-12-01
Venue: IEEE Transactions on Pattern Analysis and Machine Intelligence
DOI: 10.1109/TPAMI.2021.3131976
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Using cross validation to select the best model from a library is standard practice in machine learning. Similarly, meta learning is a widely used technique where models previously developed are combined (mainly linearly) with the expectation of improving performance with respect to individual models. In this article we consider the Conditional Super Learner (CSL), an algorithm that selects the best model candidate from a library of models conditional on the covariates. The CSL expands the idea of using cross validation to select the best model and merges it with meta learning. We propose an optimization algorithm that finds a local minimum to the problem posed and proves that it converges at a rate faster than <inline-formula><tex-math notation="LaTeX">$O_p(n^{-1/4})$</tex-math><alternatives><mml:math><mml:mrow><mml:msub><mml:mi>O</mml:mi><mml:mi>p</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mi>n</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>4</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math><inline-graphic xlink:href="valdes-ieq1-3131976.gif"/></alternatives></inline-formula>. We offer empirical evidence that: (1) CSL is an excellent candidate to substitute stacking and (2) CLS is suitable for the analysis of Hierarchical problems. Additionally, implications for global interpretability are emphasized.

2021-12-01 — Application of Machine Learning Ensemble Super Learner for analysis of the cytokines transported by high density lipoproteins (HDL) of smokers and nonsmokers

Authors: S. Saharan, P. Nagar, K. Creasy, E. Stock, James Feng, M. Malloy, J. Kane
Year: 2021
Publication Date: 2021-12-01
Venue: 2021 International Conference on Computational Science and Computational Intelligence (CSCI)
DOI: 10.1109/CSCI54926.2021.00133
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Smoking is a major cause of cardiac and pulmonary disease, cancer, and other inflammation related diseases. Smoking impairs lipid and lipoprotein metabolism. The observed modification and reduction in levels of HDL in smokers has adverse effects on atheroprotective properties. It has been hypothesized that HDL transports inflammatory cytokines which accelerate tobacco-related diseases. To investigate the role of HDL in the transport of inflammatory cytokines and their detrimental effects on the immune response, it is paramount to compare cytokine levels in HDL for Smoker versus Nonsmoker groups.We isolated HDL from plasma using selected affinity immunosorption of apolipoprotein A-I-bearing lipoproteins, followed by quantitative ELISA of cytokines.We implemented a powerful stacked ensemble Machine Learning algorithm, namely Super Learner (SL) with base-learners: Decision Tree classifier, AdaBoost classifier, Bagging classifier, Extra Tree classifier, Logistic Regression and Random Forest classifier and meta learner: Logistic Regression. Prediction Accuracy metric was used to ascertain the separability efficacy of Smoker versus Nonsmoker based on cytokine levels. Super Learner composed of a Logistic Regression meta learner, achieved a 100% prediction accuracy, outperforming all the base learners.Machine learning-enabled Precision Medicine allows the investigation of the role of novel biomarkers such as HDL-transported cytokines which have a potential to generate valuable molecular insights. The discovery that cytokines are transported by HDL presents a new dimension in understanding inflammatory disorders and the potential for therapeutic intervention.The outstanding classification and prediction performance of Ensemble learning can be leveraged to revolutionize the biomarker discoveries, enabling insight that can lead to novel treatment modalities.

2021-11-10 — Sensitivity analysis of unmeasured confounding in causal inference based on exponential tilting and super learner

Authors: Mi Zhou, W. Yao
Year: 2021
Publication Date: 2021-11-10
Venue: Journal of Applied Statistics
DOI: 10.1080/02664763.2021.1999398
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Causal inference under the potential outcome framework relies on the strongly ignorable treatment assumption. This assumption is usually questionable in observational studies, and the unmeasured confounding is one of the fundamental challenges in causal inference. To this end, we propose a new sensitivity analysis method to evaluate the impact of the unmeasured confounder by leveraging ideas of doubly robust estimators, the exponential tilt method, and the super learner algorithm. Compared to other existing methods of sensitivity analysis that parameterize the unmeasured confounder as a latent variable in the working models, the exponential tilting method does not impose any restrictions on the structure or models of the unmeasured confounders. In addition, in order to reduce the modeling bias of traditional parametric methods, we propose incorporating the super learner machine learning algorithm to perform nonparametric model estimation and the corresponding sensitivity analysis. Furthermore, most existing sensitivity analysis methods require multivariate sensitivity parameters, which make its choice difficult and subjective in practice. In comparison, the new method has a univariate sensitivity parameter with a nice and simple interpretation of log-odds ratios for binary outcomes, which makes its choice and the application of the new sensitivity analysis method very easy for practitioners.

2021-10-24 — A demonstration of Modified Treatment Policies to evaluate shifts in mobility and COVID-19 case rates in U.S. counties.

Authors: Joshua R Nugent, L. Balzer
Year: 2021
Publication Date: 2021-10-24
Venue: American Journal of Epidemiology
DOI: 10.1093/aje/kwad005
Link: Semantic Scholar
Matched Keywords: super learner, targeted minimum loss based estimation, tmle

Abstract:
Mixed evidence exists of associations between mobility data and COVID-19 case rates. We aimed to evaluate the county-level impact of reducing mobility on new COVID-19 cases in summer/fall 2020 in the United States and to demonstrate modified treatment policies (MTPs) to define causal effects with continuous exposures. Specifically, we investigated the impact of shifting the distribution of 10 mobility indices on the number of newly reported cases per 100,000 residents two weeks ahead. Primary analyses used targeted minimum loss-based estimation (TMLE) with Super Learner to avoid parametric modeling assumptions during statistical estimation and flexibly adjust for a wide range of confounders, including recent case rates. We also implemented unadjusted analyses. For most weeks, unadjusted analyses suggested strong associations between mobility indices and subsequent new case rates. However, after confounder adjustment, none of the indices showed consistent associations under mobility reduction. Our analysis demonstrates the utility of this novel distribution-shift approach to defining and estimating causal effects with continuous exposures in epidemiology and public health.

2021-10-19 — CIMTx: An R Package for Causal Inference with Multiple Treatments using Observational Data

Authors: Liangyuan Hu, Jiayi Ji
Year: 2021
Publication Date: 2021-10-19
Venue: The R Journal
DOI: 10.32614/rj-2022-058
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
CIMTx provides efficient and unified functions to implement modern methods for causal inferences with multiple treatments using observational data with a focus on binary outcomes. The methods include regression adjustment, inverse probability of treatment weighting, Bayesian additive regression trees, regression adjustment with multivariate spline of the generalized propensity score, vector matching and targeted maximum likelihood estimation. In addition, CIMTx illustrates ways in which users can simulate data adhering to the complex data structures in the multiple treatment setting. Furthermore, the CIMTx package offers a unique set of features to address the key causal assumptions: positivity and ignorability. For the positivity assumption, CIMTx demonstrates techniques to identify the common support region for retaining inferential units using inverse probability of treatment weighting, Bayesian additive regression trees and vector matching. To handle the ignorability assumption, CIMTx provides a flexible Monte Carlo sensitivity analysis approach to evaluate how causal conclusions would be altered in response to different magnitude of departure from ignorable treatment assignment.

2021-10-18 — Defining and estimating effects in cluster randomized trials: A methods comparison

Authors: Alejandra Benitez, M. Petersen, M. J. van der Laan, N. Santos, E. Butrick, D. Walker, R. Ghosh, P. Otieno, P. Waiswa, L. Balzer
Year: 2021
Publication Date: 2021-10-18
Venue: Statistics in Medicine
DOI: 10.1002/sim.9813
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Across research disciplines, cluster randomized trials (CRTs) are commonly implemented to evaluate interventions delivered to groups of participants, such as communities and clinics. Despite advances in the design and analysis of CRTs, several challenges remain. First, there are many possible ways to specify the causal effect of interest (eg, at the individual‐level or at the cluster‐level). Second, the theoretical and practical performance of common methods for CRT analysis remain poorly understood. Here, we present a general framework to formally define an array of causal effects in terms of summary measures of counterfactual outcomes. Next, we provide a comprehensive overview of CRT estimators, including the t‐test, generalized estimating equations (GEE), augmented‐GEE, and targeted maximum likelihood estimation (TMLE). Using finite sample simulations, we illustrate the practical performance of these estimators for different causal effects and when, as commonly occurs, there are limited numbers of clusters of different sizes. Finally, our application to data from the Preterm Birth Initiative (PTBi) study demonstrates the real‐world impact of varying cluster sizes and targeting effects at the cluster‐level or at the individual‐level. Specifically, the relative effect of the PTBi intervention was 0.81 at the cluster‐level, corresponding to a 19% reduction in outcome incidence, and was 0.66 at the individual‐level, corresponding to a 34% reduction in outcome risk. Given its flexibility to estimate a variety of user‐specified effects and ability to adaptively adjust for covariates for precision gains while maintaining Type‐I error control, we conclude TMLE is a promising tool for CRT analysis.

2021-10-12 — State-Level Masking Mandates and COVID-19 Outcomes in the United States

Authors: A. I. Wong, L. Balzer
Year: 2021
Publication Date: 2021-10-12
Venue: Epidemiology
DOI: 10.1097/EDE.0000000000001453
Link: Semantic Scholar
Matched Keywords: super learner, targeted maximum likelihood estimation

Abstract:
Supplemental Digital Content is available in the text. Background: We sought to investigate the effect of public masking mandates in US states on COVID-19 at the national level in Fall 2020. Specifically, we aimed to evaluate how the relative growth of COVID-19 cases and deaths would have differed if all states had issued a mandate to mask in public by 1 September 2020 versus if all states had delayed issuing such a mandate. Methods: We applied the Causal Roadmap, a formal framework for causal and statistical inference. We defined the outcome as the state-specific relative increase in cumulative cases and in cumulative deaths 21, 30, 45, and 60 days after 1 September. Despite the natural experiment occurring at the state-level, the causal effect of masking policies on COVID-19 outcomes was not identifiable. Nonetheless, we specified the target statistical parameter as the adjusted rate ratio (aRR): the expected outcome with early implementation divided by the expected outcome with delayed implementation, after adjusting for state-level confounders. To minimize strong estimation assumptions, primary analyses used targeted maximum likelihood estimation with Super Learner. Results: After 60 days and at a national level, early implementation was associated with a 9% reduction in new COVID-19 cases (aRR = 0.91 [95% CI = 0.88, 0.95]) and a 16% reduction in new COVID-19 deaths (aRR = 0.84 [95% CI = 0.76, 0.93]). Conclusions: Although lack of identifiability prohibited causal interpretations, application of the Causal Roadmap facilitated estimation and inference of statistical associations, providing timely answers to pressing questions in the COVID-19 response.

2021-10-01 — Estimating influences of unemployment and underemployment on mental health during the COVID-19 pandemic: who suffers the most?

Authors: J. Lee, A. Kapteyn, Adriane J. Clomax, Haomiao Jin
Year: 2021
Publication Date: 2021-10-01
Venue: Public Health
DOI: 10.1016/j.puhe.2021.09.038
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Objectives The aim of the study was to evaluate whether unemployment and underemployment are associated with mental distress and whether employment insecurity and its mental health consequences are disproportionately concentrated among specific social groups in the United States during the COVID-19 pandemic. Study design This is a population-based longitudinal study. Methods Data came from the Understanding America Study, a population-based panel in the United States. Between April and May 2020, 3548 adults who were not out of the labor force were surveyed. Analyses using targeted maximum likelihood estimation examined the association of employment insecurity with depression, assessed using the 2-item Patient Health Questionnaire, and anxiety, measured with the 2-item Generalized Anxiety Disorder scale. Stratified models were evaluated to examine whether employment insecurity and its mental health consequences are disproportionately concentrated among specific social groups. Results Being unemployed or underemployed was associated with increased odds of having depression (adjusted odds ratio [AOR] = 1.66, 95% confidence interval [CI] = 1.36–2.02) and anxiety (AOR = 1.50, 95% CI = 1.26, 1.79), relative to having a full-time job. Employment insecurity was disproportionately concentrated among Hispanics (54.3%), Blacks (60.6%), women (55.9%), young adults (aged 18–29 years; 57.0%), and those without a college degree (62.7%). Furthermore, Hispanic workers, subsequent to employment insecurity, experienced worse effects on depression (AOR = 2.08, 95% CI = 1.28, 3.40) and anxiety (AOR = 1.95, 95% CI = 1.24, 3.09). Those who completed high school or less reported worse depression subsequent to employment insecurity (AOR = 2.44, 95% CI = 1.55, 3.85). Conclusions Both unemployment and underemployment threaten mental health during the pandemic, and the mental health repercussions are not felt equally across the population. Employment insecurity during the pandemic should be considered an important public health concern that may exacerbate pre-existing mental health disparities during and after the pandemic.

2021-09-28 — Hospitalization outcomes among brain metastasis patients receiving radiation therapy with or without stereotactic radiosurgery from the 2005–2014 Nationwide Inpatient Sample

Authors: H. Beydoun, M. Beydoun, Shuyan Huang, Shaker M Eid, A. Zonderman
Year: 2021
Publication Date: 2021-09-28
Venue: Scientific Reports
DOI: 10.1038/s41598-021-98563-y
Link: Semantic Scholar
Matched Keywords: super learner, targeted maximum likelihood estimation

Abstract:
The purpose of this study was to compare hospitalization outcomes among US inpatients with brain metastases who received stereotactic radiosurgery (SRS) and/or non-SRS radiation therapies without neurosurgical intervention. A cross-sectional study was conducted whereby existing data on 35,199 hospitalization records (non-SRS alone: 32,981; SRS alone: 1035; SRS + non-SRS: 1183) from 2005 to 2014 Nationwide Inpatient Sample were analyzed. Targeted maximum likelihood estimation and Super Learner algorithms were applied to estimate average treatment effects (ATE), marginal odds ratios (MOR) and causal risk ratio (CRR) for three distinct types of radiation therapy in relation to hospitalization outcomes, including length of stay (‘ ≥ 7 days’ vs. ‘ < 7 days’) and discharge destination (‘non-routine’ vs. ‘routine’), controlling for patient and hospital characteristics. Recipients of SRS alone (ATE = − 0.071, CRR = 0.88, MOR = 0.75) or SRS + non-SRS (ATE = − 0.17, CRR = 0.70, MOR = 0.50) had shorter hospitalizations as compared to recipients of non-SRS alone. Recipients of SRS alone (ATE = − 0.13, CRR = 0.78, MOR = 0.59) or SRS + non-SRS (ATE = − 0.17, CRR = 0.72, MOR = 0.51) had reduced risks of non-routine discharge as compared to recipients of non-SRS alone. Similar analyses suggested recipients of SRS alone had shorter hospitalizations and similar risk of non-routine discharge when compared to recipients of SRS + non-SRS radiation therapies. SRS alone or in combination with non-SRS therapies may reduce the risks of prolonged hospitalization and non-routine discharge among hospitalized US patients with brain metastases who underwent radiation therapy without neurosurgical intervention.

2021-09-28 — Evaluating the robustness of targeted maximum likelihood estimators via realistic simulations in nutrition intervention trials

Authors: Haodong Li, Sonali Rosete, Jeremy Coyle, Rachael V. Phillips, N. Hejazi, I. Malenica, B. Arnold, J. Benjamin-Chung, Andrew N. Mertens, J. Colford, M. J. van der Laan, A. Hubbard
Year: 2021
Publication Date: 2021-09-28
Venue: Statistics in Medicine
DOI: 10.1002/sim.9348
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso, targeted maximum likelihood estimation

Abstract:
Several recently developed methods have the potential to harness machine learning in the pursuit of target quantities inspired by causal inference, including inverse weighting, doubly robust estimating equations and substitution estimators like targeted maximum likelihood estimation. There are even more recent augmentations of these procedures that can increase robustness, by adding a layer of cross‐validation (cross‐validated targeted maximum likelihood estimation and double machine learning, as applied to substitution and estimating equation approaches, respectively). While these methods have been evaluated individually on simulated and experimental data sets, a comprehensive analysis of their performance across real data based simulations have yet to be conducted. In this work, we benchmark multiple widely used methods for estimation of the average treatment effect using ten different nutrition intervention studies data. A nonparametric regression method, undersmoothed highly adaptive lasso, is used to generate the simulated distribution which preserves important features from the observed data and reproduces a set of true target parameters. For each simulated data, we apply the methods above to estimate the average treatment effects as well as their standard errors and resulting confidence intervals. Based on the analytic results, a general recommendation is put forth for use of the cross‐validated variants of both substitution and estimating equation estimators. We conclude that the additional layer of cross‐validation helps in avoiding unintentional over‐fitting of nuisance parameter functionals and leads to more robust inferences.

2021-09-21 — Personalized Online Machine Learning

Authors: I. Malenica, Rachael V. Phillips, R. Pirracchio, A. Chambaz, A. Hubbard, M. J. Laan
Year: 2021
Publication Date: 2021-09-21
Venue: arXiv.org
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
In this work, we introduce the Personalized Online Super Learner (POSL) -- an online ensembling algorithm for streaming data whose optimization procedure accommodates varying degrees of personalization. Namely, POSL optimizes predictions with respect to baseline covariates, so personalization can vary from completely individualized (i.e., optimization with respect to baseline covariate subject ID) to many individuals (i.e., optimization with respect to common baseline covariates). As an online algorithm, POSL learns in real-time. POSL can leverage a diversity of candidate algorithms, including online algorithms with different training and update times, fixed algorithms that are never updated during the procedure, pooled algorithms that learn from many individuals' time-series, and individualized algorithms that learn from within a single time-series. POSL's ensembling of this hybrid of base learning strategies depends on the amount of data collected, the stationarity of the time-series, and the mutual characteristics of a group of time-series. In essence, POSL decides whether to learn across samples, through time, or both, based on the underlying (unknown) structure in the data. For a wide range of simulations that reflect realistic forecasting scenarios, and in a medical data application, we examine the performance of POSL relative to other current ensembling and online learning methods. We show that POSL is able to provide reliable predictions for time-series data and adjust to changing data-generating environments. We further cultivate POSL's practicality by extending it to settings where time-series enter/exit dynamically over chronological time.

2021-09-08 — Association of medical male circumcision and sexually transmitted infections in a population-based study using targeted maximum likelihood estimation

Authors: L. Amusa, T. Zewotir, D. North, A. Kharsany, Lara Lewis
Year: 2021
Publication Date: 2021-09-08
Venue: BMC Public Health
DOI: 10.1186/s12889-021-11705-9
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Epidemiological theory and many empirical studies support the hypothesis that there is a protective effect of male circumcision against some sexually transmitted infections (STIs). However, there is a paucity of randomized control trials (RCTs) to test this hypothesis in the South African population. Due to the infeasibility of conducting RCTs, estimating marginal or average treatment effects with observational data increases interest. Using targeted maximum likelihood estimation (TMLE), a doubly robust estimation technique, we aim to provide evidence of an association between medical male circumcision (MMC) and two STI outcomes. HIV and HSV-2 status were the two primary outcomes for this study. We investigated the associations between MMC and these STI outcomes, using cross-sectional data from the HIV Incidence Provincial Surveillance System (HIPSS) study in KwaZulu-Natal, South Africa. HIV antibodies were tested from the blood samples collected in the study. For HSV-2, serum samples were tested for HSV-2 antibodies via an ELISA-based anti-HSV-2 IgG. We estimated marginal prevalence ratios (PR) using TMLE and compared estimates with those from propensity score full matching (PSFM) and inverse probability of treatment weighting (IPTW). From a total 2850 male participants included in the analytic sample, the overall weighted prevalence of HIV was 32.4% (n = 941) and HSV-2 was 53.2% (n = 1529). TMLE estimates suggest that MMC was associated with 31% lower HIV prevalence (PR: 0.690; 95% CI: 0.614, 0.777) and 21.1% lower HSV-2 prevalence (PR: 0.789; 95% CI: 0.734, 0.848). The propensity score analyses also provided evidence of association of MMC with lower prevalence of HIV and HSV-2. For PSFM: HIV (PR: 0.689; 95% CI: 0.537, 0.885), and HSV-2 (PR: 0.832; 95% CI: 0.709, 0.975). For IPTW: HIV (PR: 0.708; 95% CI: 0.572, 0.875), and HSV-2 (PR: 0.837; 95% CI: 0.738, 0.949). Using a TMLE approach, we present further evidence of a protective association of MMC against HIV and HSV-2 in this hyper-endemic South African setting. TMLE has the potential to enhance the evidence base for recommendations that embrace the effect of public health interventions on health or disease outcomes.

2021-09-03 — University of California, Berkeley

Authors: Colleen Reding
Year: 2021
Publication Date: 2021-09-03
Venue: Grad's Guide to Graduate Admissions Essays
DOI: 10.4324/9781003235361-23
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Doubly-robust estimators are widely used to draw inference about the average effect of a treatment. Such estimators are consistent for the effect of interest if either one of two nuisance parameters is consistently estimated. However, if flexible, data-adaptive estimators of these nuisance parameters are used, double-robustness does not readily extend to inference. We present a general theoretical study of the behavior of doubly-robust estimators of an average treatment effect when one of the nuisance parameters is inconsistently estimated. We contrast dif-ferent approaches for constructing such estimators and investigate the extent to which they may be modified to also allow doubly-robust inference. We find that while targeted maximum likelihood estimation can be used to solve this problem very naturally, common alternative frameworks appear to be inappropriate for this purpose. We provide a theoretical study and a numerical evaluation of the alternatives considered. Our simulations highlight the need and usefulness of these approaches in practice, while our theoretical developments have broad implications for the construction of estimators that permit doubly-robust inference in other problems.

2021-09-01 — P09 Estimation of the causal effect of church attendance on risk of Mycobacterium tuberculosis infection in young children in rural Malawi using targeted maximum likelihood estimation

Authors: P. Khan, K. Baisley, Leo Martinez, T. Mzembe, R. Chiumya, K. Kranzer, P. Fine, K. Fielding, A. Crampin, J. Glynn
Year: 2021
Publication Date: 2021-09-01
Venue: SSM Annual Scientific Meeting
DOI: 10.1136/jech-2021-ssmabstracts.99
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
their social networks but a need to be constantly available was sometimes overwhelming, suggesting an ‘over-stimulation’ effect. Conclusion Caregivers and teachers should take a nuanced approach to addressing young people’s SMU rather than following the dominant alarmist discourse. A measured approach should be taken, providing clear, reasonable guidance and boundary-setting but also promoting trust and responsible time management, and acknowledging the role of social media in making connections. Understanding and sharing in online experiences is likely to promote social connectedness. Supporting young people to negotiate breathing space in online interactions and prioritising trust over availability in peer relationships may optimise the role of social media in promoting peer connectedness in particular.

2021-09-01 — Effect of a patient-centered hypertension delivery strategy on all-cause mortality: Secondary analysis of SEARCH, a community-randomized trial in rural Kenya and Uganda

Authors: Matthew D. Hickey, J. Ayieko, A. Owaraganise, Nicholas Sim, L. Balzer, J. Kabami, Mucunguzi Atukunda, Fredrick J Opel, Erick M Wafula, Marilyn Nyabuti, L. Brown, G. Chamie, V. Jain, James Peng, D. Kwarisiima, C. Camlin, E. Charlebois, C. Cohen, E. Bukusi, M. Kamya, M. Petersen, D. Havlir
Year: 2021
Publication Date: 2021-09-01
Venue: PLoS Medicine
DOI: 10.1371/journal.pmed.1003803
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Background Hypertension treatment reduces morbidity and mortality yet has not been broadly implemented in many low-resource settings, including sub-Saharan Africa (SSA). We hypothesized that a patient-centered integrated chronic disease model that included hypertension treatment and leveraged the HIV care system would reduce mortality among adults with uncontrolled hypertension in rural Kenya and Uganda. Methods and findings This is a secondary analysis of the SEARCH trial (NCT:01864603), in which 32 communities underwent baseline population-based multidisease testing, including hypertension screening, and were randomized to standard country-guided treatment or to a patient-centered integrated chronic care model including treatment for hypertension, diabetes, and HIV. Patient-centered care included on-site introduction to clinic staff at screening, nursing triage to expedite visits, reduced visit frequency, flexible clinic hours, and a welcoming clinic environment. The analytic population included nonpregnant adults (≥18 years) with baseline uncontrolled hypertension (blood pressure ≥140/90 mm Hg). The primary outcome was 3-year all-cause mortality with comprehensive population-level assessment. Secondary outcomes included hypertension control assessed at a population level at year 3 (defined per country guidelines as at least 1 blood pressure measure <140/90 mm Hg on 3 repeated measures). Between-arm comparisons used cluster-level targeted maximum likelihood estimation. Among 86,078 adults screened at study baseline (June 2013 to July 2014), 10,928 (13%) had uncontrolled hypertension. Median age was 53 years (25th to 75th percentile 40 to 66); 6,058 (55%) were female; 677 (6%) were HIV infected; and 477 (4%) had diabetes mellitus. Overall, 174 participants (3.2%) in the intervention group and 225 participants (4.1%) in the control group died during 3 years of follow-up (adjusted relative risk (aRR) 0.79, 95% confidence interval (CI) 0.64 to 0.97, p = 0.028). Among those with baseline grade 3 hypertension (≥180/110 mm Hg), 22 (4.9%) in the intervention group and 42 (7.9%) in the control group died during 3 years of follow-up (aRR 0.62, 95% CI 0.39 to 0.97, p = 0.038). Estimated population-level hypertension control at year 3 was 53% in intervention and 44% in control communities (aRR 1.22, 95% CI 1.12 to 1.33, p < 0.001). Study limitations include inability to identify specific causes of death and control conditions that exceeded current standard hypertension care. Conclusions In this cluster randomized comparison where both arms received population-level hypertension screening, implementation of a patient-centered hypertension care model was associated with a 21% reduction in all-cause mortality and a 22% improvement in hypertension control compared to standard care among adults with baseline uncontrolled hypertension. Patient-centered chronic care programs for HIV can be leveraged to reduce the overall burden of cardiovascular mortality in SSA. Trial registration ClinicalTrials.gov NCT01864603.

2021-09-01 — 521Performance of doubly-robust, machine learning effect estimators in realistic epidemiologic data settings and practical recommendations

Authors: J. Huang, Xiang Meng
Year: 2021
Publication Date: 2021-09-01
Venue: International Journal of Epidemiology
DOI: 10.1093/ije/dyab168.293
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Flexible, data-adaptive algorithms (machine learning; ML) for nuisance parameter estimation in epidemiologic causal inference have promising asymptotic properties for complex, high-dimensional data. However, recently proposed applications (e.g. targeted maximum likelihood estimation; TMLE) may produce biases parameter and standard error estimates in common real-world cohort settings. The relative performance of these novel estimators over simpler approaches in such settings is unclear. We apply double-crossfit TMLE, augmented inverse probability weighting (AIPW), and standard IPW to simple simulations (5 covariates) and “real-world” data using covariate-structure-preserving (“plasmode”) simulations of 1,178 subjects and 331 covariates from a longitudinal birth cohort. We evaluate various data generating and estimation scenarios including: under- and over- (e.g. excess orthogonal covariates) identification, poor data support, near-instruments, and mis-specified biological interactions. We also track representative computation times. We replicate optimal performance of cross-fit, doubly robust estimators in simple data generating processes. However, in nearly every real world-based scenario, estimators fit with parametric learners outperform those that include non-parametric learners in terms of mean bias and confidence interval coverage. Even when correctly specified, estimators fit with non-parametric algorithms (xgboost, random forest) performed poorly (e.g. 24% bias, 57% coverage vs. 10% bias, 79% coverage for parametric fit), at times underperforming simple IPW. In typical epidemiologic data sets, double-crossfit estimators fit with simple smooth, parametric learners may be the optimal solution, taking 2-5 times less computation time than flexible non-parametric models, while having equal or better performance. No approaches are optimal, and estimators should be compared on simulations close to the source data. In epidemiologic studies, use of flexible non-parametric algorithms for effect estimation should be strongly justified (i.e. high-dimensional covariates) and performed with care. Parametric learners may be a safer option with few drawbacks.

2021-09-01 — 1321Marginal structural model of the causal role of hepatitis B vaccination in multiple sclerosis risk

Authors: S. Akhtar, Hadeel El-Muzaini, R. Alroughani
Year: 2021
Publication Date: 2021-09-01
Venue: International Journal of Epidemiology
DOI: 10.1093/ije/dyab168.023
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
There are conflicting reports regarding the association between uptake of recombinant vaccine against hepatitis B virus (HBV) and risk of multiple sclerosis (MS). Most cohort or case-control studies found no significant short- or long-term increase in MS risk after immunization. Whereas others reported a significant increase in MS risk within three years of HBV vaccination. The present matched case-control study was conducted to test the hypothesis whether recombinant HBV vaccination status is causally associated with MS risk using targeted maximum likelihood estimation (TMLE) that uses data-adaptive flexible machine learning algorithms to estimate the causal parameters. Confirmed 110 MS incident cases and age (± 5 years), gender and nationality matched (1:1) 110 community controls were enrolled. A pre-tested structured questionnaire was used to collect the data on demographics, environmental factors, comorbidities, history of vaccinations through face-to-face interviews both from cases and controls. We implemented case-control-weighted TMLE – a double robust, multistep procedure to estimate causal relative risk (RR), marginal odds ratio (OR) and population attributable fraction. This study demonstrated a non-specific protective effect of HBV vaccine against MS risk as estimated by TMLE (causal RR 0.63, 95% CI: 0.45-0.90; p = 0.004; marginal OR 0.43; 95% CI: 0.18-0.67; p = 0.006). The population attributable fraction was 20% (95% CI: 6%, 34%; p = 0.014)) Subject to inherent limitations of the case-control design, this study suggests a non-specific protective effect of recombinant HBV vaccination against MS risk. Future studies may contemplate to confirm these results. Causal analysis showed a non-specific protective causal association between uptake of recombinant HBV vaccine and MS risk in the study population.

2021-09-01 — 1223Handling missing data for causal effect estimation in cohort studies using Targeted Maximum Likelihood Estimation

Authors: G. Dashti, Katherine J. Lee, J. Simpson, Ian R. White, J. Carlin, M. Moreno-Betancur
Year: 2021
Publication Date: 2021-09-01
Venue: International Journal of Epidemiology
DOI: 10.1093/ije/dyab168.150
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Causal inference from cohort studies is central to epidemiological research. Targeted Maximum Likelihood Estimation (TMLE) is an appealing doubly robust method for causal effect estimation, but it is unclear how missing data should be handled when it is used in conjunction with machine learning approaches for the exposure and outcome models. This is problematic because missing data are ubiquitous and can result in biased estimates and loss of precision if handled inappropriately. Based on a motivating example from the Victorian Adolescent Health Cohort Study, we conducted a simulation study to evaluate the performance of available approaches for handling missing data when using TMLE with machine learning. These included complete-case analysis; an extended TMLE approach incorporating an outcome missingness probability model; the missing indicator approach for missing covariate data (MCMI); and multiple imputation (MI) using standard parametric approaches or machine learning algorithms. We considered 11 missingness mechanisms typical in cohort studies, and a simple and a complex setting, in which exposure and outcome generation models included two-way and higher-order interactions. MI using regression with no interactions and MI with random forest yielded estimates with the highest bias. MI with regression including two-way interactions was the best performing method overall. Of the non-MI approaches, MCMI performed the worst When using TMLE with machine learning to estimate the average causal effect, avoiding standard MI with no interactions and MCMI is recommended. We provide novel guidance for handling missing data for causal effect estimation using TMLE.

2021-08-10 — Time-to-event comparative effectiveness of NOACs vs VKAs in newly diagnosed non-valvular atrial fibrillation patients.

Authors: B. Gallego, Jie Zhu
Year: 2021
Publication Date: 2021-08-10
Venue: medRxiv
DOI: 10.1101/2021.08.06.21261092
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Objective: To investigate the difference in the time-to-event probabilities of ischaemic events, major bleeding and death of NOAC vs VKAs in newly diagnosed non-valvular atrial fibrillation patients. Design: Retrospective observational cohort study. Setting: UK's Clinical Practice Research Data linked to the Hospital Episode Statistics inpatient and outpatient data, mortality data and the Patient Level Index of Multiple Deprivation. Participants: Patients over 18 years of age, with an initial diagnosis of atrial fibrillation between 1st-Mar-2011 and 31-July-2017, without a record for a valve condition, prosthesis or procedure previous to initial diagnosis, and without a record of oral anticoagulant treatment in the previous year. Intervention: Oral anticoagulant treatment with either vitamin K antagonists (VKAs) or the newer target-specific oral anticoagulants (NOACs). Main Outcome Measures: Ischaemic event, major bleeding event and death from 15 days from initial prescription up to two years follow-up. Statistical Analysis: Treatment effect was defined as the difference in time-to-event probability between NOAC and VKA treatment groups. Treatment and outcomes were modelled using an ensemble of parametric and non-parametric models, and the average and conditional average treatment effects were estimated using one-step Targeted Maximum Likelihood Estimation (TMLE). Heterogeneity of treatment effect was examined using variable importance methods in Bayesian Additive Regression Trees (BART). Results: The average treatment effect of NOAC vs VKA was consistently close to zero across all times, with a temporal average of $0.00[95%0.00,0.00]$ for ischaemic event, $0.00%[95%-0.01,0.01]$ for major bleeding and $0.00[95%-0.01,0.01]$ for death. Only history of major bleeding was found to influence the distribution of treatment effect for major bleeding, but its impact on the associated conditional average treatment effect was not significant. Conclusions: This study found no statistically significant difference between NOAC and VKA users up to two years of medication use for the prevention of ischaemic events, major bleeding or death.

2021-08-02 — Brain tumor segmentation using cluster ensemble and deep super learner for classification of MRI

Authors: P. Ramya, M. S. Thanabal, C. Dharmaraja
Year: 2021
Publication Date: 2021-08-02
Venue: Journal of Ambient Intelligence and Humanized Computing
DOI: 10.1007/s12652-021-03390-8
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2021-07-28 — One-step ahead sequential Super Learning from short times series of many slightly dependent data, and anticipating the cost of natural disasters

Authors: Geoffrey Ecoto, Aurélien F. Bibaut, A. Chambaz
Year: 2021
Publication Date: 2021-07-28
Link: Semantic Scholar
Matched Keywords: super learner, super learning

Abstract:
Suppose that we observe a short time series where each time-t-specific data-structure consists of many slightly dependent data indexed by a and that we want to estimate a feature of the law of the experiment that depends neither on t nor on a. We develop and study an algorithm to learn sequentially which base algorithm in a user-supplied collection best carries out the estimation task in terms of excess risk and oracular inequalities. The analysis, which uses dependency graph to model the amount of conditional independence within each t-specific data-structure and a concentration inequality by Janson [2004], leverages a large ratio of the number of distinct a's to the degree of the dependency graph in the face of a small number of t-specific data-structures. The so-called one-step ahead Super Learner is applied to the motivating example where the challenge is to anticipate the cost of natural disasters in France.

2021-07-27 — Diagnosis and Classification of Diabetes Mellitus Type 1 and Type 2 Through Super Learner

Authors: N. A, G. Kavitha
Year: 2021
Publication Date: 2021-07-27
DOI: 10.21203/rs.3.rs-718966/v1
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Diabetes Mellitus (DM) plays a significant role in increasing the associated health problems worldwide by acting as a Comorbid condition. Moreover, it is a progressive illness without severe external symptoms leading to a fatal impact on the human body if left unnoticed or untreated. This research work aims to associate an individual’s lifestyle and ethnic background in assessing the risk of Diabetes acting as a comorbid condition. A detailed assessment of lockdown impact with rapid modification in individual’s lifestyle due to the pandemic gives specific insight into individuals becoming susceptible to Diabetes Mellitus. An ensemble of ML algorithms is utilized in predicting the risk of individuals turning Diabetic. The ensemble of the ML model is trained on the Pima Indian dataset and Vanderbilt biostatistics diabetes dataset providing the impact of Type 1 diabetes mellitus. The proposed super learner model provides the highest classification accuracy of T1DM & T2DM with 97% compared to an ensemble of algorithms in identifying and classifying the individuals as being susceptible to DM due to the lifestyle and ethnic background.

2021-07-23 — Nonparametric estimation of the causal effect of a stochastic threshold‐based intervention

Authors: Lars van der Laan, Wenbo Zhang, P. Gilbert
Year: 2021
Publication Date: 2021-07-23
Venue: Biometrics
DOI: 10.1111/biom.13690
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Identifying a biomarker or treatment‐dose threshold that marks a specified level of risk is an important problem, especially in clinical trials. In view of this goal, we consider a covariate‐adjusted threshold‐based interventional estimand, which happens to equal the binary treatment–specific mean estimand from the causal inference literature obtained by dichotomizing the continuous biomarker or treatment as above or below a threshold. The unadjusted version of this estimand was considered in Donovan et al.. Expanding upon Stitelman et al., we show that this estimand, under conditions, identifies the expected outcome of a stochastic intervention that sets the treatment dose of all participants above the threshold. We propose a novel nonparametric efficient estimator for the covariate‐adjusted threshold‐response function for the case of informative outcome missingness, which utilizes machine learning and targeted minimum‐loss estimation (TMLE). We prove the estimator is efficient and characterize its asymptotic distribution and robustness properties. Construction of simultaneous 95% confidence bands for the threshold‐specific estimand across a set of thresholds is discussed. In the Supporting Information, we discuss how to adjust our estimator when the biomarker is missing at random, as occurs in clinical trials with biased sampling designs, using inverse probability weighting. Efficiency and bias reduction of the proposed estimator are assessed in simulations. The methods are employed to estimate neutralizing antibody thresholds for virologically confirmed dengue risk in the CYD14 and CYD15 dengue vaccine trials.

2021-07-23 — Gauging the impact of Ethiopia’s productive safety net programme on agriculture: Application of targeted maximum likelihood estimation approach

Authors: B. Bahru, M. Zeller
Year: 2021
Publication Date: 2021-07-23
Venue: Journal of Agricultural Economics
DOI: 10.1111/1477-9552.12452
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2021-07-15 — Demystifying Statistical Inference When Using Machine Learning in Causal Research.

Authors: L. Balzer, T. Westling
Year: 2021
Publication Date: 2021-07-15
Venue: American Journal of Epidemiology
DOI: 10.1093/aje/kwab200
Link: Semantic Scholar
Matched Keywords: super learner, targeted maximum likelihood estimation, tmle

Abstract:
In this issue, Naimi et al. (Am J Epidemiol. XXXX;XXX(XX):XXXX-XXXX) discuss a critical topic in public health and beyond: obtaining valid statistical inference when using machine learning in causal research. In doing so, the authors review recent prominent methodological work and recommend: (i) double robust estimators, such as targeted maximum likelihood estimation (TMLE); (ii) ensemble methods, such as Super Learner, to combine predictions from a diverse library of algorithms, and (iii) sample-splitting to reduce bias and improve inference. We largely agree with these recommendations. In this commentary, we highlight the critical importance of the Super Learner library. Specifically, in both simulation settings considered by the authors, we demonstrate that low bias and valid statistical inference can be achieved using TMLE without sample-splitting and with a Super Learner library that excludes tree-based methods but includes regression splines. Whether extremely data-adaptive algorithms and sample-splitting are needed depends on the specific problem and should be informed by simulations reflecting the specific application. More research is needed on practical recommendations for selecting among these options in common situations arising in epidemiology.

2021-07-15 — AIPW: An R Package for Augmented Inverse Probability Weighted Estimation of Average Causal Effects.

Authors: Yongqi Zhong, Edward H. Kennedy, L. Bodnar, A. Naimi
Year: 2021
Publication Date: 2021-07-15
Venue: American Journal of Epidemiology
DOI: 10.1093/aje/kwab207
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
An increasing number of recent studies suggest doubly robust estimators with cross-fitting should be used when estimating causal effects with machine learning methods. However, existing programs that implement doubly robust estimators do not all support machine learning methods and cross-fitting, or provide estimates on multiplicative scales. To address these needs, we developed the AIPW package implementing the augmented inverse probability weighting (AIPW) estimation of average causal effects in R. Key features of the AIPW package includes cross-fitting and flexible covariate adjustment for observational studies and randomized trials (RCTs). In this paper, we use a simulated RCT to present the implementation of the AIPW estimator. We also perform a simulation study to evaluate the performance of the AIPW package compared with other doubly robust implementations including CausalGAM, npcausal, tmle, and tmle3. Our simulation shows that the xtbfAIPW package yielded comparable performance to other programs. Furthermore, we also found that cross-fitting substantively decreases the bias and improves the confidence interval coverage for doubly robust estimators fit with machine learning algorithms. Our findings suggest that the AIPW package can be a useful tool for estimating average causal effects with machine learning methods in RCTs and observational studies.

2021-07-09 — A Cognitive Framework to detect AUD patients from EEG signal using Hybrid Super Learning model

Authors: Sricheta Parui, Deborsi Basu
Year: 2021
Publication Date: 2021-07-09
Venue: IEEE International Conference on Electronics, Computing and Communication Technologies
DOI: 10.1109/CONECCT52877.2021.9622723
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
No need to clarify the fact that consuming alcohol has some serious effect on the human brain and hampers our daily lifestyle. Also, it may cause difficulties in recalling memories, instability, and even blackout. Recognizing early symptoms of substance dependence and having adequate care in the rehabilitation phase will make a big difference. The screening test for patients' alcohol dependence was arbitrary and could misinterpret the true level of alcohol consumption in some cases. Although the paradigm of neuroimaging (EEG) showed positive outcomes of research in obtaining objective findings when assessing and diagnosing intoxicated patients. This work extract features from EEG brain signals and then optimizes the collection of features using mutual information, feature importance, LASSO regularization, and the RFE method step by step. The optimized features set is then considered for detecting AUD (Alcohol Use Disorder) patients and healthy persons. Super Learning approaches adopted for the classification task. This is accomplished by bagging and boosting results from a set of machine learning models for classification. The findings reveal that the ensemble method of feature optimization accompanied by the hybrid super-learning classification provides better performance. The proposed approach has experimented with EEG data set from the UCI Machine Learning repository and the experimental results substantiate the efficacy of the approach and also comparable to the state-of-the-art approaches.

2021-07-04 — One-step TMLE for targeting cause-specific absolute risks and survival curves

Authors: H. Rytgaard, M. J. Laan
Year: 2021
Publication Date: 2021-07-04
Venue: Biometrika
DOI: 10.1093/biomet/asad033
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
This paper considers one-step targeted maximum likelihood estimation methodology for multi-dimensional causal parameters in general survival and competing risks settings where event times take place on the positive real line ℝ+ and are subject to right-censoring. We focus on effects of baseline treatment decisions possibly confounded by pre-treatment covariates, but remark that our work generalizes to settings with time-varying treatment regimes and time-dependent confounding. We point out two overall contributions of our work. First, our methods can be used to obtain simultaneous inference for treatment effects on multiple absolute risks in competing risks settings. Second, our methods can be used to achieve inference for the full survival curve, or a full absolute risk curve, across time. The one-step targeted maximum likelihood procedure is based on a one-dimensional universal least favourable submodel for each cause-specific hazard that we implement in recursive steps along a corresponding non-universal multivariate least favourable submodel. Our empirical study demonstrates the practical use of the methods.

2021-07-01 — Using the Super Learner algorithm to predict risk of 30-day readmission after bariatric surgery in the United States.

Authors: Matteo S. Torquati, Morgan Mendis, Huiwen Xu, A. Myneni, K. Noyes, Aaron B. Hoffman, Philip Omotosho, A. Becerra
Year: 2021
Publication Date: 2021-07-01
Venue: Surgery
DOI: 10.1016/j.surg.2021.06.019
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2021-07-01 — Prediction of Alternative Drug-Induced Liver Injury Classifications Using Molecular Descriptors, Gene Expression Perturbation, and Toxicology Reports

Authors: W. Lesiński, Krzysztof Mnich, W. Rudnicki
Year: 2021
Publication Date: 2021-07-01
Venue: Frontiers in Genetics
DOI: 10.3389/fgene.2021.661075
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Motivation: Drug-induced liver injury (DILI) is one of the primary problems in drug development. Early prediction of DILI, based on the chemical properties of substances and experiments performed on cell lines, would bring a significant reduction in the cost of clinical trials and faster development of drugs. The current study aims to build predictive models of risk of DILI for chemical compounds using multiple sources of information. Methods: Using several supervised machine learning algorithms, we built predictive models for several alternative splits of compounds between DILI and non-DILI classes. To this end, we used chemical properties of the given compounds, their effects on gene expression levels in six human cell lines treated with them, as well as their toxicological profiles. First, we identified the most informative variables in all data sets. Then, these variables were used to build machine learning models. Finally, composite models were built with the Super Learner approach. All modeling was performed using multiple repeats of cross-validation for unbiased and precise estimates of performance. Results: With one exception, gene expression profiles of human cell lines were non-informative and resulted in random models. Toxicological reports were not useful for prediction of DILI. The best results were obtained for models discerning between harmless compounds and those for which any level of DILI was observed (AUC = 0.75). These models were built with Random Forest algorithm that used molecular descriptors.

2021-06-29 — Two-Stage TMLE to reduce bias and improve efficiency in cluster randomized trials

Authors: L. Balzer, M. J. van der Laan, J. Ayieko, M. Kamya, G. Chamie, Joshua Schwab, D. Havlir, M. Petersen
Year: 2021
Publication Date: 2021-06-29
Venue: Biostatistics
DOI: 10.1093/biostatistics/kxab043
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Summary Cluster randomized trials (CRTs) randomly assign an intervention to groups of individuals (e.g., clinics or communities) and measure outcomes on individuals in those groups. While offering many advantages, this experimental design introduces challenges that are only partially addressed by existing analytic approaches. First, outcomes are often missing for some individuals within clusters. Failing to appropriately adjust for differential outcome measurement can result in biased estimates and inference. Second, CRTs often randomize limited numbers of clusters, resulting in chance imbalances on baseline outcome predictors between arms. Failing to adaptively adjust for these imbalances and other predictive covariates can result in efficiency losses. To address these methodological gaps, we propose and evaluate a novel two-stage targeted minimum loss-based estimator to adjust for baseline covariates in a manner that optimizes precision, after controlling for baseline and postbaseline causes of missing outcomes. Finite sample simulations illustrate that our approach can nearly eliminate bias due to differential outcome measurement, while existing CRT estimators yield misleading results and inferences. Application to real data from the SEARCH community randomized trial demonstrates the gains in efficiency afforded through adaptive adjustment for baseline covariates, after controlling for missingness on individual-level outcomes.

2021-06-27 — Combining Teaching Strategies, Learning Strategies, and Elements of Super Learning Principles

Authors: Duli Pllana
Year: 2021
Publication Date: 2021-06-27
Venue: Advances in Social Sciences Research Journal
DOI: 10.14738/assrj.86.10366
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Blending teaching strategies, learning strategies, and elements of super learning principles increase learning outcomes tremendously in any case, situation, or academic subject. Employing teaching and learning strategies adequately impact on an interactive session (academic subjects or any field) to a great degree, enhance learners’ motivation significantly, improve self confidence and self esteem of learners considerably, and soar learning outcomes substiantly. It is impossible to combine all learning and teaching strategies (there are many techniques, and a small space time to incorporate them in one lesson or an academic subject.) in an academic subject entirely. Accordingly, strategic teaching or learning establishes skills or techniques in addressing a lesson or digesting information from the lesson. Also, learning results depend on the quantity and quality of combining learning and teaching strategies, and components of super learning principles. The greater the participation of mixing techniques or skills in a lesson, the greater are the positive results in the learning outcomes. Teaching and learning strategies, and superlearning elements are in a close relationship with each other; teaching strategies imply learning strategies and elements of super learning. Combination of the three ingredients play a crucial part in any lesson, academic subject, or general knowledge; mixing all these three components together wisely maximizes learning outcomes enormously.

2021-06-24 — The causal effect and impact of reproductive factors on breast cancer using super learner and targeted maximum likelihood estimation: a case-control study in Fars Province, Iran

Authors: A. Almasi-Hashiani, S. Nedjat, R. Ghiasvand, S. Safiri, M. Nazemipour, N. Mansournia, M. Mansournia
Year: 2021
Publication Date: 2021-06-24
Venue: BMC Public Health
DOI: 10.1186/s12889-021-11307-5
Link: Semantic Scholar
Matched Keywords: super learner, targeted maximum likelihood estimation

Abstract:
The relationship between reproductive factors and breast cancer (BC) risk has been investigated in previous studies. Considering the discrepancies in the results, the aim of this study was to estimate the causal effect of reproductive factors on BC risk in a case-control study using the double robust approach of targeted maximum likelihood estimation. This is a causal reanalysis of a case-control study done between 2005 and 2008 in Shiraz, Iran, in which 787 confirmed BC cases and 928 controls were enrolled. Targeted maximum likelihood estimation along with super Learner were used to analyze the data, and risk ratio (RR), risk difference (RD), andpopulation attributable fraction (PAF) were reported. Our findings did not support parity and age at the first pregnancy as risk factors for BC. The risk of BC was higher among postmenopausal women (RR = 3.3, 95% confidence interval (CI) = (2.3, 4.6)), women with the age at first marriage ≥20 years (RR = 1.6, 95% CI = (1.3, 2.1)), and the history of oral contraceptive (OC) use (RR = 1.6, 95% CI = (1.3, 2.1)) or breastfeeding duration ≤60 months (RR = 1.8, 95% CI = (1.3, 2.5)). The PAF for menopause status, breastfeeding duration, and OC use were 40.3% (95% CI = 39.5, 40.6), 27.3% (95% CI = 23.1, 30.8) and 24.4% (95% CI = 10.5, 35.5), respectively. Postmenopausal women, and women with a higher age at first marriage, shorter duration of breastfeeding, and history of OC use are at the higher risk of BC.

2021-06-21 — Estimation of time‐specific intervention effects on continuously distributed time‐to‐event outcomes by targeted maximum likelihood estimation

Authors: H. Rytgaard, F. Eriksson, M. J. van der Laan
Year: 2021
Publication Date: 2021-06-21
Venue: Biometrics
DOI: 10.1111/biom.13856
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso, targeted maximum likelihood estimation, tmle

Abstract:
This work considers targeted maximum likelihood estimation (TMLE) of treatment effects on absolute risk and survival probabilities in classical time‐to‐event settings characterized by right‐censoring and competing risks. TMLE is a general methodology combining flexible ensemble learning and semiparametric efficiency theory in a two‐step procedure for substitution estimation of causal parameters. We specialize and extend the continuous‐time TMLE methods for competing risks settings, proposing a targeting algorithm that iteratively updates cause‐specific hazards to solve the efficient influence curve equation for the target parameter. As part of the work, we further detail and implement the recently proposed highly adaptive lasso estimator for continuous‐time conditional hazards with L1‐penalized Poisson regression. The resulting estimation procedure benefits from relying solely on very mild nonparametric restrictions on the statistical model, thus providing a novel tool for machine‐learning‐based semiparametric causal inference for continuous‐time time‐to‐event data. We apply the methods to a publicly available dataset on follicular cell lymphoma where subjects are followed over time until disease relapse or death without relapse. The data display important time‐varying effects that can be captured by the highly adaptive lasso. In our simulations that are designed to imitate the data, we compare our methods to a similar approach based on random survival forests and to the discrete‐time TMLE.

2021-06-21 — Estimating the Impact of Sustained Social Participation on Depressive Symptoms in Older Adults

Authors: K. Shiba, Jacqueline M. Torres, Adel Daoud, Kosuke Inoue, S. Kanamori, T. Tsuji, M. Kamada, K. Kondo, I. Kawachi
Year: 2021
Publication Date: 2021-06-21
Venue: Epidemiology
DOI: 10.1097/EDE.0000000000001395
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Supplemental Digital Content is available in the text. Background: Social participation has been suggested as a means to prevent depressive symptoms. However, it remains unclear whether a one-time boost suffices or whether participation needs to be sustained over time for long-term prevention. We estimated the impacts of alternative hypothetical interventions in social participation on subsequent depressive symptoms among older adults. Methods: Data were from a nationwide prospective cohort study of Japanese older adults ≥65 years of age (n = 32,748). We analyzed social participation (1) as a baseline exposure from 2010 (approximating a one-time boost intervention) and (2) as a time-varying exposure from 2010 and 2013 (approximating a sustained intervention). We defined binary depressive symptoms in 2016 using the Geriatric Depression Scale. We used the doubly robust targeted maximum likelihood estimation to address time-dependent confounding. Results: The magnitude of the association between sustained participation and the lower prevalence of depressive symptoms was larger than the association observed for baseline participation only (e.g., prevalence ratio [PR] for participation in any activity = 0.83 [95% confidence interval = 0.79, 0.88] vs. 0.90 [0.87, 0.94]). For activities with a lower proportion of consistent participation over time (e.g., senior clubs), there was little evidence of an association between baseline participation and subsequent depressive symptoms, while an association for sustained participation was evident (e.g., PR for senior clubs = 0.96 [0.90, 1.02] vs. 0.88 [0.79, 0.97]). Participation at baseline but withholding participation in 2013 was not associated with subsequent depressive symptoms. Conclusions: Sustained social participation may be more strongly associated with fewer depressive symptoms among older adults.

2021-06-16 — Prediction of Water Saturation from Well Log Data by Machine Learning Algorithms: Boosting and Super Learner

Authors: Fahimeh Hadavimoghaddam, M. Ostadhassan, Mohammad Ali Sadri, Tatiana Bondarenko, I. Chebyshev, A. Semnani
Year: 2021
Publication Date: 2021-06-16
Venue: Journal of Marine Science and Engineering
DOI: 10.3390/JMSE9060666
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Intelligent predictive methods have the power to reliably estimate water saturation (Sw) compared to conventional experimental methods commonly performed by petrphysicists. However, due to nonlinearity and uncertainty in the data set, the prediction might not be accurate. There exist new machine learning (ML) algorithms such as gradient boosting techniques that have shown significant success in other disciplines yet have not been examined for Sw prediction or other reservoir or rock properties in the petroleum industry. To bridge the literature gap, in this study, for the first time, a total of five ML code programs that belong to the family of Super Learner along with boosting algorithms: XGBoost, LightGBM, CatBoost, AdaBoost, are developed to predict water saturation without relying on the resistivity log data. This is important since conventional methods of water saturation prediction that rely on resistivity log can become problematic in particular formations such as shale or tight carbonates. Thus, to do so, two datasets were constructed by collecting several types of well logs (Gamma, density, neutron, sonic, PEF, and without PEF) to evaluate the robustness and accuracy of the models by comparing the results with laboratory-measured data. It was found that Super Learner and XGBoost produced the highest accurate output (R2: 0.999 and 0.993, respectively), and with considerable distance, Catboost and LightGBM were ranked third and fourth, respectively. Ultimately, both XGBoost and Super Learner produced negligible errors but the latest is considered as the best amongst all.

2021-06-16 — Identification of microenvironment related potential biomarkers of biochemical recurrence at 3 years after prostatectomy in prostate adenocarcinoma

Authors: Xiaoru Sun, Lu Wang, Hongkai Li, Chuandi Jin, Yuanyuan Yu, L. Hou, Xinhui Liu, Yifan Yu, Ran Yan, F. Xue
Year: 2021
Publication Date: 2021-06-16
Venue: Aging
DOI: 10.18632/aging.203121
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Prostate adenocarcinoma is one of the leading adult malignancies. Identification of multiple causative biomarkers is necessary and helpful for determining the occurrence and prognosis of prostate adenocarcinoma. We aimed to identify the potential prognostic genes in the prostate adenocarcinoma microenvironment and to estimate the causal effects simultaneously. We obtained the gene expression data of prostate adenocarcinoma from TCGA project and identified the differentially expressed genes based on immune-stromal components. Among these genes, 68 were associated with biochemical recurrence at 3 years after prostatectomy in prostate adenocarcinoma. After adjusting for the minimal sets of confounding covariates, 14 genes (TNFRSF4, ZAP70, ERMN, CXCL5, SPINK6, SLC6A18, CHRM2, TG, CLLU1OS, POSTN, CTSG, NETO1, CEACAM7, and IGLV3-22) related to the microenvironment were identified as prognostic biomarkers using the targeted maximum likelihood estimation. Both the average and individual causal effects were obtained to measure the magnitude of the effect. CIBERSORT and gene set enrichment analyses showed that these prognostic genes were mainly associated with immune responses. POSTN and NETO1 were correlated with androgen receptor expression, a main driver of prostate adenocarcinoma progression. Finally, five genes were validated in another prostate adenocarcinoma cohort (GEO: GSE70770). These findings might lead to the improved prognosis of prostate adenocarcinoma.

2021-06-16 — Association Between Glucagon-Like Peptide 1 Receptor Agonist and Sodium–Glucose Cotransporter 2 Inhibitor Use and COVID-19 Outcomes

Authors: A. Kahkoska, T. Abrahamsen, G. Alexander, T. Bennett, C. Chute, M. Haendel, Klara R. Klein, H. Mehta, Joshua D. Miller, R. Moffitt, T. Stürmer, K. Kvist, J. Buse
Year: 2021
Publication Date: 2021-06-16
Venue: Diabetes Care
DOI: 10.2337/dc21-0065
Link: Semantic Scholar
Matched Keywords: super learner, targeted maximum likelihood estimation

Abstract:
OBJECTIVE To determine the respective associations of premorbid glucagon-like peptide-1 receptor agonist (GLP1-RA) and sodium–glucose cotransporter 2 inhibitor (SGLT2i) use, compared with premorbid dipeptidyl peptidase 4 inhibitor (DPP4i) use, with severity of outcomes in the setting of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. RESEARCH DESIGN AND METHODS We analyzed observational data from SARS-CoV-2–positive adults in the National COVID Cohort Collaborative (N3C), a multicenter, longitudinal U.S. cohort (January 2018–February 2021), with a prescription for GLP1-RA, SGLT2i, or DPP4i within 24 months of positive SARS-CoV-2 PCR test. The primary outcome was 60-day mortality, measured from positive SARS-CoV-2 test date. Secondary outcomes were total mortality during the observation period and emergency room visits, hospitalization, and mechanical ventilation within 14 days. Associations were quantified with odds ratios (ORs) estimated with targeted maximum likelihood estimation using a super learner approach, accounting for baseline characteristics. RESULTS The study included 12,446 individuals (53.4% female, 62.5% White, mean ± SD age 58.6 ± 13.1 years). The 60-day mortality was 3.11% (387 of 12,446), with 2.06% (138 of 6,692) for GLP1-RA use, 2.32% (85 of 3,665) for SGLT2i use, and 5.67% (199 of 3,511) for DPP4i use. Both GLP1-RA and SGLT2i use were associated with lower 60-day mortality compared with DPP4i use (OR 0.54 [95% CI 0.37–0.80] and 0.66 [0.50–0.86], respectively). Use of both medications was also associated with decreased total mortality, emergency room visits, and hospitalizations. CONCLUSIONS Among SARS-CoV-2–positive adults, premorbid GLP1-RA and SGLT2i use, compared with DPP4i use, was associated with lower odds of mortality and other adverse outcomes, although DPP4i users were older and generally sicker.

2021-06-04 — Modified Lapidus vs Scarf Osteotomy Outcomes for Treatment of Hallux Valgus Deformity

Authors: M. Reilly, Matthew S. Conti, J. Day, A. MacMahon, Bopha Chrea, Kristin C. Caolo, Nicholas Williams, M. Drakos, S. Ellis
Year: 2021
Publication Date: 2021-06-04
Venue: Foot & ankle international
DOI: 10.1177/10711007211013776
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Background: The Lapidus procedure and scarf osteotomy are indicated for the operative treatment of hallux valgus; however, no prior studies have compared outcomes between the procedures. The aim of this study was to compare clinical and radiographic outcomes between patients with symptomatic hallux valgus treated with the modified Lapidus procedure versus scarf osteotomy. Methods: This retrospective cohort study included patients treated by 1 of 7 fellowship-trained foot and ankle surgeons. Inclusion criteria were age older than 18 years, primary modified Lapidus procedure or scarf osteotomy for hallux valgus, minimum 1-year postoperative Patient-Reported Outcomes Measurement Information System (PROMIS) scores, and minimum 3-month postoperative radiographs. Revision cases were excluded. Clinical outcomes were assessed using 6 PROMIS domains. Pre- and postoperative radiographic parameters were measured on anteroposterior (AP) and lateral weightbearing radiographs. Statistical analysis utilized targeted minimum-loss estimation (TMLE) to control for confounders. Results: A total of 136 patients (73 Lapidus, 63 scarf) with an average of 17.8 months of follow-up were included in this study. There was significant improvement in PROMIS physical function scores in the modified Lapidus (mean change, 5.25; P < .01) and scarf osteotomy (mean change, 5.50; P < .01) cohorts, with no significant differences between the 2 groups (P = .85). After controlling for bunion severity, the probability of having a normal postoperative intermetatarsal angle (IMA; <9 degrees) was 25% lower (P = .04) with the scarf osteotomy compared with the Lapidus procedure. Conclusion: Although the modified Lapidus procedure led to a higher probability of achieving a normal IMA, both procedures yielded similar improvements in 1-year patient-reported outcome measures. Level of Evidence: Level III, retrospective cohort.

2021-06-03 — Identification of Tumor Microenvironment-Related Prognostic Biomarkers for Ovarian Serous Cancer 3-Year Mortality Using Targeted Maximum Likelihood Estimation: A TCGA Data Mining Study

Authors: Lu Wang, Xiaoru Sun, Chuandi Jin, Yue Fan, F. Xue
Year: 2021
Publication Date: 2021-06-03
Venue: Frontiers in Genetics
DOI: 10.3389/fgene.2021.625145
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Ovarian serous cancer (OSC) is one of the leading causes of death across the world. The role of the tumor microenvironment (TME) in OSC has received increasing attention. Targeted maximum likelihood estimation (TMLE) is developed under a counterfactual framework to produce effect estimation for both the population level and individual level. In this study, we aim to identify TME-related genes and using the TMLE method to estimate their effects on the 3-year mortality of OSC. In total, 285 OSC patients from the TCGA database constituted the studying population. ESTIMATE algorithm was implemented to evaluate immune and stromal components in TME. Differential analysis between high-score and low-score groups regarding ImmuneScore and StromalScore was performed to select shared differential expressed genes (DEGs). Univariate logistic regression analysis was followed to evaluate associations between DEGs and clinical pathologic factors with 3-year mortality. TMLE analysis was conducted to estimate the average effect (AE), individual effect (IE), and marginal odds ratio (MOR). The validation was performed using three datasets from Gene Expression Omnibus (GEO) database. Additionally, 355 DEGs were selected after differential analysis, and 12 genes from DEGs were significant after univariate logistic regression. Four genes remained significant after TMLE analysis. In specific, ARID3C and FREM2 were negatively correlated with OSC 3-year mortality. CROCC2 and PTF1A were positively correlated with OSC 3-year mortality. Combining of ESTIMATE algorithm and TMLE algorithm, we identified four TME-related genes in OSC. AEs were estimated to provide averaged effects based on the population level, while IEs were estimated to provide individualized effects and may be helpful for precision medicine.

2021-06-01 — OP0117 REAL-WORLD EFFECTIVENESS OF TNFI VERSUS NON-TNFI BIOLOGICS ON DISEASE ACTIVITY IN PATIENTS WITH RHEUMATOID ARTHRITIS: DATA FROM THE ACR’S RISE REGISTRY

Authors: M. Gianfrancesco, J. Li, Michael Evans, M. Petersen, G. Schmajuk, J. Yazdany
Year: 2021
Publication Date: 2021-06-01
DOI: 10.1136/ANNRHEUMDIS-2021-EULAR.1032
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Our understanding of how medications such as biologic disease modifying anti-rheumatic drugs and targeted small molecules (b/tsDMARDs) influence disease activity in RA is based largely on randomized controlled trials (RCTs). However, most U.S. trials in RA are limited by small sample sizes and have often excluded patients who are older, male, and from racial/ethnic minorities. Whether effectiveness of b/tsDMARDs varies in these populations has largely been unexplored.We aimed to examine differences in longitudinal RA disease activity by demographic and clinical characteristics using a novel electronic health record data source of rheumatology providers across the U.S. We simulated various treatment assignments of b/tsDMARDs that have been examined in RCTs: namely, TNF-inhibitors (TNFi) and non-TNFi.We included 16,448 individuals from the ACR’s RISE registry with ≥ 2 RA diagnoses (ICD-9: 714.0) ≥ 30 days apart, who had at least 2 recorded clinical disease activity index (CDAI) scores and no historical b/tsDMARD use documented in RISE. b/tsDMARD use and CDAI scores were assessed at each quarter; covariates included sex, race (white, Black, Asian, other), ethnicity (Hispanic/non-Hispanic), age, smoking, obesity, area deprivation index, other DMARD use, RF status, anti-CCP status, and practice type. Longitudinal targeted maximum likelihood estimation estimated the average treatment effect (ATE) of cumulative TNFi vs. non-TNFi use over a 12-month period on CDAI score among the entire population and across various subgroups based on demographic and clinical characteristics, accounting for censoring and time-varying confounding.Approximately 75% of patients were female with a mean age of 65.1 (+/- 13.7) years. Sixty percent of patients were white, 8% black, 2% Asian, and 30% other/mixed or unknown race; 6% were Hispanic. The mean CDAI score at baseline was 11.3 (+/- 10.7). For the overall population, there was no significant difference in disease activity between TNFi and non-TNFi at 12 months (ATE= 0.85, 95% CI -0.26, 1.96; Table 1). Stratified analyses found higher disease activity for TNFi compared to non-TNFi among patients of Black and Asian race, non-Hispanic ethnicity, and female sex. Among Black race patients, TNFi use was associated with a 6.08 point higher CDAI score compared to non-TNFi use (95% CI 1.99, 10.17). In contrast, in Hispanic/Latino ethnicity patients, TNFi use was associated with a lower CDAI score compared to non-TNFi use (ATE= -2.64, 95% CI -3.99, -1.30).Table 1.Average treatment effect (ATE) of cumulative TNFi vs. non-TNFi use at 12-months on CDAI score in patients with RATNFiNon-TNFiATE (95% CI)Overall (n=16,448)8.847.990.85 (-0.26, 1.96)Race White (n=9,814)8.246.811.42 (0.03, 2.81)* Black (n=1,358)13.917.836.08 (1.99, 10.17)* Asian (n=301)6.542.743.80 (2.93, 4.67)*Ethnicity Non-Hispanic (n=14,216)8.927.631.29 (0.08, 2.51)* Hispanic (n=938)5.698.33-2.64 (-3.99, -1.30)*Sex Female (n=12,527)8.987.471.51 (0.31, 2.72)* Male (n=3,921)8.579.49-0.92 (-3.42, 1.58)*P<0.05Results from this RCT simulation study suggest that non-TNFi may have an important role as first-line agents in the treatment of Black and Asian patients, but not Hispanic patients. These novel findings fill gaps where RCTs have not been conducted, highlight the need for inclusion of diverse populations in future trials, and have the potential to lead to a more personalized approach to rheumatologic care.Milena Gianfrancesco: None declared, Jing Li: None declared, Michael Evans: None declared, Maya Petersen: None declared, Gabriela Schmajuk: None declared, Jinoos Yazdany Consultant of: Eli Lilly and Astra Zeneca, unrelated to this project., Grant/research support from: Gilead, unrelated to this project.

2021-06-01 — Long-Term Effects of Hearing Aids on Hearing Ability in Patients with Sensorineural Hearing Loss

Authors: A. R. Goel, Haley A Bruce, Nicholas Williams, G. Alexiades
Year: 2021
Publication Date: 2021-06-01
Venue: Journal of american academy of audiology
DOI: 10.1055/s-0041-1731592
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract Background A frequent concern surrounding amplification with hearing aids for patients with sensorineural hearing loss is whether these devices negatively affect hearing ability. To date, there have been few studies examining the long-term effects of amplification on audiometric outcomes in adults. Purpose In the present study, we examined how hearing aids affect standard audiometric outcomes over long-term periods of follow-up. Research Design We retrospectively collected audiometric data in adults with sensorineural hearing loss, constructing a model of long-term outcomes. Study Sample This retrospective cohort study included 802 ears from 401 adult patients with bilateral sensorineural hearing loss eligible for amplification with hearing aids at a single institution. Intervention Of the eligible patients, 88 were aided bilaterally, and 313 were unaided. Data Collection and Analysis We examined the standard three-frequency pure-tone average (PTA3-Freq), a novel extended pure-tone average (PTAExt), and word recognition score (WRS) per-ear at each encounter. We then modeled the association between the use of hearing aids for 5 years and these audiometric outcomes using targeted maximum likelihood estimation. Results In comparing aided and unaided ears at the end of 5 years, there were discernible effects for all measurements. The PTA3-Freq was 5 dB greater in aided ears (95% CI: 1.37–8.64, p = 0.007), WRS was 4.5 percentage points lower in aided ears (95% CI: −9.14 to 0.15, p = 0.058), and PTAExt was 5 dB greater in aided ears (95% CI: 2.18–7.82, p < 0.001), adjusting for measured confounders. Conclusion Our analysis revealed discernible effects of 5 years of hearing aid use on hearing ability, specifically as measured by the PTA3-Freq, novel PTAExt, and WRS, suggesting a greater decline in hearing ability in patients using hearing aids. Future studies are needed to examine these effects between treatment groups over longer periods of time and in more heterogeneous populations to improve clinical practice guidelines and safety of both prescriptive fitting nonprescriptive amplification.

2021-05-31 — A Three Layer Super Learner Ensemble with Hyperparameter Optimization to Improve the Performance of Machine Learning Model

Authors: S. Kasthuriarachchi, S. Liyanage
Year: 2021
Publication Date: 2021-05-31
Venue: Advanced technologies
DOI: 10.31357/ait.v1i1.4844
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
A combination of different machine learning models to form a super learner can definitely lead to improved predictions in any domain. The super learner ensemble discussed in this study collates several machine learning models and proposes to enhance the performance by considering the final meta- model accuracy and the prediction duration. An algorithm is proposed to rate the machine learning models derived by combining the base classifiers voted with different weights. The proposed algorithm is named as Log Loss Weighted Super Learner Model (LLWSL). Based on the voted weight, the optimal model is selected and the machine learning method derived is identified. The meta- learner of the super learner uses them by tuning their hyperparameters. The execution time and the model accuracies were evaluated using two separate datasets inside LMSSLIITD extracted from the educational industry by executing the LLWSL algorithm. According to the outcome of the evaluation process, it has been noticed that there exists a significant improvement in the proposed algorithm LLWSL for use in machine learning tasks for the achievement of better performances.

2021-05-26 — Diet and erythrocyte metal concentrations in early pregnancy-cross-sectional analysis in Project Viva.

Authors: Pi-I. D. Lin, Andres Cardenas, S. Rifas-Shiman, M. Hivert, T. James-Todd, C. Amarasiriwardena, R. Wright, Mohammad L. Rahman, E. Oken
Year: 2021
Publication Date: 2021-05-26
Venue: American Journal of Clinical Nutrition
DOI: 10.1093/ajcn/nqab088
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
BACKGROUND Dietary sources of metals are not well established among pregnant women in the United States. OBJECTIVE We aimed to perform a diet-wide association study (DWAS) of metals during the first trimester of pregnancy. METHODS In early pregnancy (11.3 ± 2.8 weeks of gestation), 1196 women from Project Viva (recruited 1999-2002 in eastern Massachusetts) completed a validated FFQ (135 food items) and underwent measurements of erythrocyte metals [arsenic (As), barium, cadmium, cesium (Cs), copper, mercury (Hg), magnesium, manganese, lead (Pb), selenium (Se), zinc]. The DWAS involved a systematic evaluation and visualization of all bivariate relations for each food-metal combination. For dietary items with strong associations with erythrocyte metals, we applied targeted maximum likelihood estimations and substitution models to evaluate how hypothetical dietary interventions would influence metals' concentrations. RESULTS Participants' mean ± SD age was 32.5 ± 4.5 y and prepregnancy BMI was 24.8 ± 5.4 kg/m2; they were mostly white (75.9%), college graduates (72.4%), married or cohabitating (94.6%), had a household income >$70,000/y (63.5%), and had never smoked (67.1%). Compared with other US-based cohorts, the overall diet quality of participants was above average, and concentrations of erythrocyte metals were lower. The DWAS identified significant associations of several food items with As, Hg, Pb, Cs, and Se; for example, As was higher for each SD increment in fresh fruit (11.5%; 95% CI: 4.9%, 18.4%), white rice (17.9%; 95% CI: 9.4%, 26.9%), and seafood (50.9%; 95% CI: 42.8%, 59.3%). Following the guidelines for pregnant women to consume ≤3 servings/wk of seafood was associated with lower As (-0.55 ng/g; 95% CI: -0.82, -0.28 ng/g) and lower Hg (-2.67 ng/g; 95% CI: -3.55, -1.80 ng/g). Substituting white rice with bread, pasta, tortilla, and potato was also associated with lower As (35%-50%) and Hg (35%-70%). CONCLUSIONS Our DWAS provides a systematic evaluation of diet-metals relations. Prenatal diet may be an important source of exposures to metals.

2021-05-22 — Super LeArner Prediction of NAb Panels (SLAPNAP): a containerized tool for predicting combination monoclonal broadly neutralizing antibody sensitivity

Authors: B. Williamson, Craig A. Magaret, Peter B. Gilbert, Sohail Nizam, Courtney Simmons, David C. Benkeser
Year: 2021
Publication Date: 2021-05-22
Venue: Bioinform.
DOI: 10.1093/bioinformatics/btab398
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
MOTIVATION A single monoclonal broadly neutralizing antibody (bnAb) regimen was recently evaluated in two randomized trials for prevention efficacy against HIV-1 infection. Subsequent trials will evaluate combination bnAb regimens (e.g., cocktails, multi-specific antibodies), which demonstrate higher potency and breadth in vitro compared to single bnAbs. Given the large number of potential regimens, methods for down-selecting these regimens into efficacy trials are of great interest. RESULTS We developed Super LeArner Prediction of NAb Panels (SLAPNAP), a software tool for training and evaluating machine learning models that predict in vitro neutralization sensitivity of HIV Envelope (Env) pseudoviruses to a given single or combination bnAb regimen, based on Env amino acid sequence features. SLAPNAP also provides measures of variable importance of sequence features. By predicting bnAb coverage of circulating sequences, SLAPNAP can improve ranking of bnAb regimens by their potential prevention efficacy. In addition, SLAPNAP can improve sieve analysis by defining sequence features that impact bnAb prevention efficacy. AVAILABILITY SLAPNAP is a freely available docker image that can be downloaded from DockerHub (https://hub.docker.com/r/slapnap/slapnap). Source code and documentation are available at GitHub (respectively, https://github.com/benkeser/slapnap and https://benkeser.github.io/slapnap/). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

2021-05-20 — Effect of first-line pembrolizumab treatment in individuals with advanced non-small cell lung cancer and poor performance status.

Authors: R. Veluswamy, Jiayi Ji, Liangyuan Hu, Xiaoliang Wang, Cardinale B. Smith, M. Kale
Year: 2021
Publication Date: 2021-05-20
DOI: 10.1200/JCO.2021.39.15_SUPPL.E18796
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
e18796 Background: There is limited evidence supporting the optimal use of immune checkpoint inhibitors (ICIs) in NSCLC patients with poor performance status (PS), as clinical trials exclude these patients. In this study, we use real-world oncology data to determine the impact of first line pembrolizumab vs. no treatment in high PD(L)-1 expressing cancers in individuals with advanced NSCLC and ECOG PS ≥2. Methods: We performed a retrospective cohort study of patients with advanced NSCLC with ECOG PS ≥2 between 09/01/2014 and 02/18/2020, using the nationwide Flatiron Health electronic health record (EHR)-derived de-identified database. Patients were included if they were PD(L)-1 high (≥50%) and had clinical and treatment information recorded within 90 days of diagnosis. Real-world overall survival (rwOS) was defined as time from diagnosis to death (censored at last EHR activity). Median rwOS was estimated using weighted Kaplan-Meier methods. A marginal Cox structural model with inverse probability of treatment weighting was used to adjust for selection bias and estimate the effectiveness of pembrolizumab. The inverse probability weights were estimated using an ensemble machine learning technique, Super Learner, based on age, gender, race, practice type and smoking history. Adjusted hazard ratios (aHR) were estimated using weighted Cox proportional hazards models. Stratified analysis was conducted by ECOG PS (2 vs >2). Results: 217 (16%) individuals with advanced NSCLC and high PD(L)-1 expression received no treatment, compared to 546 (39%) individuals who received 1L pembrolizumab. The no-treatment group had a lower proportion of ECOG 2 compared to the pembrolizumab group (Table). Median rwOS in the no-treatment group was 2.4 months, compared to 7.1 months in the pembrolizumab group (p<0.001). In unadjusted survival analyses in the entire cohort and in cohorts stratified by ECOG status, treatment with pembrolizumab was associated with a significantly lower risk of death (hazard ratio [HR]: 0.38, 95% Confidence Interval [CI]: 0.31-0.45). In adjusted analyses, individuals treated with pembrolizumab had improved survival (HR: 0.40, 95% CI: 0.35-0.45). Conclusions: Our analysis of real-world clinical oncology data demonstrated that 1L treatment with pembrolizumab was associated with significantly improved rwOS among individuals with ECOG ≥ 2. [Table: see text]

2021-05-17 — Feature Selection and Classification of Clinical Datasets Using Bioinspired Algorithms and Super Learner

Authors: S. Murugesan, R. Bhuvaneswaran, H. K. Nehemiah, S. Sankari, Y. Jane
Year: 2021
Publication Date: 2021-05-17
Venue: Comput. Math. Methods Medicine
DOI: 10.1155/2021/6662420
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
A computer-aided diagnosis (CAD) system that employs a super learner to diagnose the presence or absence of a disease has been developed. Each clinical dataset is preprocessed and split into training set (60%) and testing set (40%). A wrapper approach that uses three bioinspired algorithms, namely, cat swarm optimization (CSO), krill herd (KH) ,and bacterial foraging optimization (BFO) with the classification accuracy of support vector machine (SVM) as the fitness function has been used for feature selection. The selected features of each bioinspired algorithm are stored in three separate databases. The features selected by each bioinspired algorithm are used to train three back propagation neural networks (BPNN) independently using the conjugate gradient algorithm (CGA). Classifier testing is performed by using the testing set on each trained classifier, and the diagnostic results obtained are used to evaluate the performance of each classifier. The classification results obtained for each instance of the testing set of the three classifiers and the class label associated with each instance of the testing set will be the candidate instances for training and testing the super learner. The training set comprises of 80% of the instances, and the testing set comprises of 20% of the instances. Experimentation has been carried out using seven clinical datasets from the University of California Irvine (UCI) machine learning repository. The super learner has achieved a classification accuracy of 96.83% for Wisconsin diagnostic breast cancer dataset (WDBC), 86.36% for Statlog heart disease dataset (SHD), 94.74% for hepatocellular carcinoma dataset (HCC), 90.48% for hepatitis dataset (HD), 81.82% for vertebral column dataset (VCD), 84% for Cleveland heart disease dataset (CHD), and 70% for Indian liver patient dataset (ILP).

2021-05-14 — Low-cost transcriptional diagnostic to accurately categorize lymphomas in low- and middle-income countries.

Authors: Fabiola Valvert, Oscar Silva, Elizabeth Solórzano-Ortiz, M. Puligandla, Marcos Mauricio Siliézar Tala, Timothy Guyon, Samuel L. Dixon, Nelly López, Francisco López, César Camilo Carías Alvarado, R. Terbrueggen, K. Stevenson, Y. Natkunam, D. Weinstock, Edward L Briercheck
Year: 2021
Publication Date: 2021-05-14
Venue: Blood Advances
DOI: 10.1182/bloodadvances.2021004347
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Inadequate diagnostics compromise cancer care across lower- and middle-income countries (LMICs). We hypothesized that an inexpensive gene expression assay using paraffin-embedded biopsy specimens from LMICs could distinguish lymphoma subtypes without pathologist input. We reviewed all biopsy specimens obtained at the Instituto de Cancerología y Hospital Dr. Bernardo Del Valle in Guatemala City between 2006 and 2018 for suspicion of lymphoma. Diagnoses were established based on the World Health Organization classification and then binned into 9 categories: nonmalignant, aggressive B-cell, diffuse large B-cell, follicular, Hodgkin, mantle cell, marginal zone, natural killer/T-cell, or mature T-cell lymphoma. We established a chemical ligation probe-based assay (CLPA) that quantifies expression of 37 genes by capillary electrophoresis with reagent/consumable cost of approximately $10/sample. To assign bins based on gene expression, 13 models were evaluated as candidate base learners, and class probabilities from each model were then used as predictors in an extreme gradient boosting super learner. Cases with call probabilities < 60% were classified as indeterminate. Four (2%) of 194 biopsy specimens in storage <3 years experienced assay failure. Diagnostic samples were divided into 70% (n = 397) training and 30% (n = 163) validation cohorts. Overall accuracy for the validation cohort was 86% (95% confidence interval [CI]: 80%-91%). After excluding 28 (17%) indeterminate calls, accuracy increased to 94% (95% CI: 89%-97%). Concordance was 97% for a set of high-probability calls (n = 37) assayed by CLPA in both the United States and Guatemala. Accuracy for a cohort of relapsed/refractory biopsy specimens (n = 39) was 79% and 88%, respectively, after excluding indeterminate cases. Machine-learning analysis of gene expression accurately classifies paraffin-embedded lymphoma biopsy specimens and could transform diagnosis in LMICs.

2021-05-05 — Continuous-time targeted minimum loss-based estimation of intervention-specific mean outcomes

Authors: H. Rytgaard, T. Gerds, M. Laan
Year: 2021
Publication Date: 2021-05-05
Venue: Annals of Statistics
DOI: 10.1214/21-aos2114
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso, targeted minimum loss based estimation, tmle

Abstract:
This paper generalizes the targeted minimum loss-based estimation (TMLE) framework to allow for estimating the effects of time-varying interventions in settings where both interventions, covariates, and outcome can happen at subject-specific time-points on an arbitrarily fine time-scale. TMLE is a general template for constructing asymptotically linear substitution estimators for smooth low-dimensional parameters in infinite-dimensional models. Existing longitudinal TMLE methods are developed for data where observations are made on a discrete time-grid. We consider a continuous-time counting process model where intensity measures track the monitoring of subjects, and focus on a low-dimensional target parameter defined as the intervention-specific mean outcome at the end of follow-up. To construct our TMLE algorithm for the given statistical estimation problem we derive an expression for the efficient influence curve and represent the target parameter as a functional of intensities and conditional expectations. The high-dimensional nuisance parameters of our model are estimated and updated in an iterative manner according to separate targeting steps for the involved intensities and conditional expectations. The resulting estimator solves the efficient influence curve equation. We state a general efficiency theorem and describe a highly adaptive lasso estimator for nuisance parameters that allows us to establish asymptotic linearity and efficiency of our estimator under minimal conditions on the underlying statistical model.

2021-05-01 — Using Causal Analysis to Measure Corticosteroids’ or Tociluzimab’s Effect on Mortality in Patients with COVID-19

Authors: B. Lee, K. Chavez, C. Schorr, H. Bach, A. Banerjee, K. Quevada, S. Patel
Year: 2021
Publication Date: 2021-05-01
Venue: TP51. TP051 COVID: LUNG INFECTION, MULTIORGAN FAILURE, AND CARDIOVASCULAR
DOI: 10.1164/AJRCCM-CONFERENCE.2021.203.1_MEETINGABSTRACTS.A2646
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
RATIONALE Severe inflammation is thought to drive disease severity in patients with COVID-19. Proposed treatments include corticosteroids and tocilizumab. Corticosteroids have demonstrated outcome benefit in patients with severe ARDS1,2. Tocilizumab (an interleukin-6 receptor antibody) is theorized to block the inflammatory cascade3. Variable steroid administration for COVID-19 illness stemmed from the possibility of increased viral shedding and limited survival benefit noted from other Coronavirus strains4. Lack of standardization for tocilizumab and corticosteroid administration leaves retrospective study prone to confounding and bias;therefore, we proposed using causal analysis to better determine treatment efficacy. In clinical observational studies, it is only possible to observe an individual participant under one treatment scenario, whereas the counterfactual (alternate) outcome is unknown5. Targeted Maximum Likelihood Estimator (TMLE) generates two matched subject populations using assigned weights to create a pseudo-population that models counterfactual outcomes while limiting confounding bias6. The causal analysis objective was to elucidate the Average Treatment Effect (ATE) for steroids and tocilizumab to reduce mortality in adult patients with COVID-19. METHODS A retrospective review of adult patients with COVID-19 admitted to an ICU between March 2020 and June 2020. The primary outcome was ICU mortality. We used TMLE with an ensemble of machine learning algorithms as the primary model7. The machine learning ensemble included Logistic Regression, a Neural Network, Naive Bayes, and XGboost8. The analysis was performed on corticosteroid and tocilizumab administration separately. The covariates included for the corticosteroid group were age, ethnicity, oxygen support level, and tocilizumab treatment. The tocilizumab analysis covariates included age, ethnicity, oxygen support, ECMO, and treatment with corticosteroids. Our primary metric was Average Treatment Effect (ATE). Secondary metrics are Odds Ratio and Risk Ratio. RESULTS Using TMLE, mortality analysis on the corticosteroids group (n=199) demonstrated an ATE (RD) of-0.259, 95% CI [-0.387 ,-0.13], risk ratio (RR) 0.512, 95% CI [0.364, 0.72] and odds ratio (OR) of 0.33 [0.188, 0.581]. The tocilizumab group (n=199) demonstrated an ATE (RD) of 0.104, 95% CI [-0.025, 0.232], RR 1.343, 95% CI [0.916, 1.97] and OR of 1.578 [0.885-2.813]. CONCLUSION Causal analysis is a useful analysis model to evaluate treatment effects. In this cohort of adult patients with COVID-19, tocilizumab did not demonstrate a mortality difference, whereas corticosteroids were associated with decreased mortality.

2021-05-01 — Open-Source Neural Architecture Search with Ensemble and Pre-trained Networks

Authors: Séamus Lankford, Diarmuid Grimes
Year: 2021
Publication Date: 2021-05-01
Venue: International Journal of Modeling and Optimization
DOI: 10.7763/IJMO.2021.V11.774
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
The training and optimization of neural networks, using pre-trained, super learner and ensemble approaches is explored. Neural networks, and in particular Convolutional Neural Networks (CNNs), are often optimized using default parameters. Neural Architecture Search (NAS) enables multiple architectures to be evaluated prior to selection of the optimal architecture. Our contribution is to develop, and make available to the community, a system that integrates open source tools for the neural architecture search (OpenNAS) of image classification models. OpenNAS takes any dataset of grayscale, or RGB images, and generates the optimal CNN architecture. Particle Swarm Optimization (PSO), Ant Colony Optimization (ACO) and pre-trained models serve as base learners for ensembles. Meta learner algorithms are subsequently applied to these base learners and the ensemble performance on image classification problems is evaluated. Our results show that a stacked generalization ensemble of heterogeneous models is the most effective approach to image classification within OpenNAS.

2021-05-01 — O Efeito Direto do Índice de Massa Corporal nos Resultados Cardiovasculares entre Participantes sem Obesidade Central pela Estimativa por Máxima Verossimilhança Direcionada

Authors: H. M. Saadati, S. Sabour, M. Mansournia, Y. Mehrabi, S. S. Nazari
Year: 2021
Publication Date: 2021-05-01
Venue: Arquivos Brasileiros de Cardiologia
DOI: 10.36660/abc.20200231
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Resumo Fundamento: O índice de massa corporal (IMC) é o índice mais usado para categorizar uma pessoa como obesa ou não-obesa, e está sujeito a limitações importantes. Objetivo: Avaliar o efeito direto do IMC nos desfechos cardiovasculares em participantes sem obesidade central. Métodos: Esta análise incluiu 14.983 homens e mulheres com idades entre 45-75 anos do Estudo de Risco de Aterosclerose em Comunidades (ARIC). O IMC foi medido como obesidade geral e a circunferência da cintura (CC), a relação cintura-quadril (RCQ) e circunferência do quadril como obesidade central. A estimativa de máxima verossimilhança direcionada (TMLE, no acrônimo em inglês) foi usada para estimar os efeitos totais (TEs) e os efeitos diretos controlados (CDEs). A proporção de ET que seria eliminada se todos os participantes fossem não obesos em relação à obesidade central foi calculada usando o índice de proporção eliminada (PE). P<0,05 foi considerado estatisticamente significativo. As análises foram realizadas no pacote TMLE R. Resultados: O risco de desfechos cardiovasculares atribuídos ao IMC foi significativamente revertido com a eliminação da obesidade na RCQ (p <0,001). A proporção eliminada dos efeitos do IMC foi mais tangível para participantes não obesos em relação à CC (PE = 127%; IC95% (126,128)) e RCQ (PE = 97%; IC95% (96,98)) para doença arterial coronariana (DAC), e RCQ (PE = 92%; IC95% (91,94)) para acidente vascular cerebral, respectivamente. Com relação ao sexo, a proporção eliminada dos efeitos do IMC foi mais tangível para participantes não obesos em relação a RCQ (PE = 428%; IC95% (408.439)) para DAC em homens e CC (PE = 99%; IC95% (89,111)) para acidente vascular cerebral em mulheres, respectivamente. Conclusão: Esses resultados indicam diferentes efeitos potenciais da eliminação da obesidade central na associação entre IMC e desfechos cardiovasculares em homens e mulheres. (Arq Bras Cardiol. 2021; 116(5):879-886)

2021-05-01 — Detecting bid-rigging coalitions in different countries and auction formats

Authors: David Imhof, Hannes Wallimann
Year: 2021
Publication Date: 2021-05-01
DOI: 10.1016/J.IRLE.2021.106016
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract We propose an original application of screening methods using machine learning to detect collusive groups of firms in procurement auctions. As a methodical innovation, we calculate coalition-based screens by forming coalitions of bidders in tenders to flag bid-rigging cartels. Using Swiss, Japanese and Italian procurement data, we investigate the effectiveness of our method in different countries and auction settings, in our cases first-price sealed-bid and mean-price sealed-bid auctions. We correctly classify 90% of the collusive and competitive coalitions when applying four machine learning algorithms: lasso, support vector machine, random forest, and super learner ensemble method. Finally, we find that coalition-based screens for the variance and the uniformity of bids are in all the cases the most important predictors according to the random forest.

2021-04-28 — Predicting Sepsis Mortality in a Population-Based National Database: Machine Learning Approach

Authors: James Yeongjun Park, T. Hsu, Jiun-Ruey Hu, Chun Chen, W. Hsu, Matthew Lee, Joshua Ho, Chien-Chang Lee
Year: 2021
Publication Date: 2021-04-28
Venue: Journal of Medical Internet Research
DOI: 10.2196/29982
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Background Although machine learning (ML) algorithms have been applied to point-of-care sepsis prognostication, ML has not been used to predict sepsis mortality in an administrative database. Therefore, we examined the performance of common ML algorithms in predicting sepsis mortality in adult patients with sepsis and compared it with that of the conventional context knowledge–based logistic regression approach. Objective The aim of this study is to examine the performance of common ML algorithms in predicting sepsis mortality in adult patients with sepsis and compare it with that of the conventional context knowledge–based logistic regression approach. Methods We examined inpatient admissions for sepsis in the US National Inpatient Sample using hospitalizations in 2010-2013 as the training data set. We developed four ML models to predict in-hospital mortality: logistic regression with least absolute shrinkage and selection operator regularization, random forest, gradient-boosted decision tree, and deep neural network. To estimate their performance, we compared our models with the Super Learner model. Using hospitalizations in 2014 as the testing data set, we examined the models’ area under the receiver operating characteristic curve (AUC), confusion matrix results, and net reclassification improvement. Results Hospitalizations of 923,759 adults were included in the analysis. Compared with the reference logistic regression (AUC: 0.786, 95% CI 0.783-0.788), all ML models showed superior discriminative ability (P<.001), including logistic regression with least absolute shrinkage and selection operator regularization (AUC: 0.878, 95% CI 0.876-0.879), random forest (AUC: 0.878, 95% CI 0.877-0.880), xgboost (AUC: 0.888, 95% CI 0.886-0.889), and neural network (AUC: 0.893, 95% CI 0.891-0.895). All 4 ML models showed higher sensitivity, specificity, positive predictive value, and negative predictive value compared with the reference logistic regression model (P<.001). We obtained similar results from the Super Learner model (AUC: 0.883, 95% CI 0.881-0.885). Conclusions ML approaches can improve sensitivity, specificity, positive predictive value, negative predictive value, discrimination, and calibration in predicting in-hospital mortality in patients hospitalized with sepsis in the United States. These models need further validation and could be applied to develop more accurate models to compare risk-standardized mortality rates across hospitals and geographic regions, paving the way for research and policy initiatives studying disparities in sepsis care.

2021-04-16 — Attributable mortality of acute respiratory distress syndrome: a systematic review, meta-analysis and survival analysis using targeted minimum loss-based estimation

Authors: L. Torres, K. Hoffman, C. Oromendia, Iván Díaz, J. Harrington, E. Schenck, David R. Price, L. Gómez-Escobar, A. Higuera, M. P. Vera, R. Baron, L. Fredenburgh, J. Huh, A. Choi, I. Siempos
Year: 2021
Publication Date: 2021-04-16
Venue: Thorax
DOI: 10.1136/thoraxjnl-2020-215950
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Background Although acute respiratory distress syndrome (ARDS) is associated with high mortality, its direct causal link with death is unclear. Clarifying this link is important to justify costly research on prevention of ARDS. Objective To estimate the attributable mortality, if any, of ARDS. Design First, we performed a systematic review and meta-analysis of observational studies reporting mortality of critically ill patients with and without ARDS matched for underlying risk factor. Next, we conducted a survival analysis of prospectively collected patient-level data from subjects enrolled in three intensive care unit (ICU) cohorts to estimate the attributable mortality of critically ill septic patients with and without ARDS using a novel causal inference method. Results In the meta-analysis, 44 studies (47 cohorts) involving 56 081 critically ill patients were included. Mortality was higher in patients with versus without ARDS (risk ratio 2.48, 95% CI 1.86 to 3.30; p<0.001) with a numerically stronger association between ARDS and mortality in trauma than sepsis. In the survival analysis of three ICU cohorts enrolling 1203 critically ill patients, 658 septic patients were included. After controlling for confounders, ARDS was found to increase the mortality rate by 15% (95% CI 3% to 26%; p=0.015). Significant increases in mortality were seen for severe (23%, 95% CI 3% to 44%; p=0.028) and moderate (16%, 95% CI 2% to 31%; p=0.031), but not for mild ARDS. Conclusions ARDS has a direct causal link with mortality. Our findings provide information about the extent to which continued funding of ARDS prevention trials has potential to impart survival benefit. PROSPERO Registration Number CRD42017078313

2021-04-16 — Artificial Intelligence for Prognostic Scores in Oncology: a Benchmarking Study

Authors: H. Loureiro, T. Becker, A. Bauer-Mehren, N. Ahmidi, J. Weberpals
Year: 2021
Publication Date: 2021-04-16
Venue: Frontiers in Artificial Intelligence
DOI: 10.3389/frai.2021.625573
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Introduction: Prognostic scores are important tools in oncology to facilitate clinical decision-making based on patient characteristics. To date, classic survival analysis using Cox proportional hazards regression has been employed in the development of these prognostic scores. With the advance of analytical models, this study aimed to determine if more complex machine-learning algorithms could outperform classical survival analysis methods. Methods: In this benchmarking study, two datasets were used to develop and compare different prognostic models for overall survival in pan-cancer populations: a nationwide EHR-derived de-identified database for training and in-sample testing and the OAK (phase III clinical trial) dataset for out-of-sample testing. A real-world database comprised 136K first-line treated cancer patients across multiple cancer types and was split into a 90% training and 10% testing dataset, respectively. The OAK dataset comprised 1,187 patients diagnosed with non-small cell lung cancer. To assess the effect of the covariate number on prognostic performance, we formed three feature sets with 27, 44 and 88 covariates. In terms of methods, we benchmarked ROPRO, a prognostic score based on the Cox model, against eight complex machine-learning models: regularized Cox, Random Survival Forests (RSF), Gradient Boosting (GB), DeepSurv (DS), Autoencoder (AE) and Super Learner (SL). The C-index was used as the performance metric to compare different models. Results: For in-sample testing on the real-world database the resulting C-index [95% CI] values for RSF 0.720 [0.716, 0.725], GB 0.722 [0.718, 0.727], DS 0.721 [0.717, 0.726] and lastly, SL 0.723 [0.718, 0.728] showed significantly better performance as compared to ROPRO 0.701 [0.696, 0.706]. Similar results were derived across all feature sets. However, for the out-of-sample validation on OAK, the stronger performance of the more complex models was not apparent anymore. Consistently, the increase in the number of prognostic covariates did not lead to an increase in model performance. Discussion: The stronger performance of the more complex models did not generalize when applied to an out-of-sample dataset. We hypothesize that future research may benefit by adding multimodal data to exploit advantages of more complex models.

2021-04-15 — Machine Learning-Based Personalized Pharmacological Treatment for Depressive Disorder: A Target Trial Emulation Study (Preprint)

Authors: Chi-Shin Wu, A. Yang, Shu-Sen Chang, Chia-Ming Chang, Yihao Liu, S. Liao, H. Tsai
Year: 2021
Publication Date: 2021-04-15
DOI: 10.2196/preprints.29652
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
BACKGROUND Developing personalized treatment is one way to improve treatment effectiveness for depressive disorder. OBJECTIVE This study was aimed to develop and validate the use of machine learning-based prediction models to select personalized pharmacological treatment for patients with depressive disorder. METHODS This study used Taiwan's National Health Insurance Research Database. Patients with diagnoses of depressive disorder between 2003 and 2012 were included in this study. The study outcome was treatment failure, which was defined as psychiatric hospitalisation, self-harm hospitalisation, emergency visits, or treatment change. Predictors included the patients’ demographic variables, clinical characteristics of depression, and medical and psychiatric comorbid conditions. Prediction models based on Super Learner algorithms were trained for the initial and the next-step treatment separately. The personalised treatment strategy was developed for choosing the drug with the lowest probability of treatment failure for each patient as the model-selected regimen. We emulated clinical trials to estimate the effect of personalised treatment. RESULTS The areas under the curve of the prediction model using Super Learner was 0·627 for the initial treatment and 0·751 for the next-step treatment. Patients treated with model-selected regimens had reduced treatment failure rates, with a 0·84-fold (95% confidence interval [CI] 0·82-0·86) decrease for the initial treatment and a 0·82-fold (95% CI 0·80-0·83) decrease for the next-step treatment. CONCLUSIONS Machine learning-based prediction models for depression outcomes had fair prediction accuracies. In emulating clinical trials, we found the model-selected regimen to be associated with a reduced treatment failure rate. Future randomised controlled trials should be conducted to investigate the effectiveness of the use of machine-learning algorithms in clinical practice.

2021-03-31 — Personal Credit Default Discrimination Model Based on Super Learner Ensemble

Authors: Gang Li, Mengdi Shen, Meixuan Li, Jingyi Cheng
Year: 2021
Publication Date: 2021-03-31
DOI: 10.1155/2021/5586120
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Assessing the default of customers is an essential basis for personal credit issuance. This paper considers developing a personal credit default discrimination model based on Super Learner heterogeneous ensemble to improve the accuracy and robustness of default discrimination. First, we select six kinds of single classifiers such as logistic regression, SVM, and three kinds of homogeneous ensemble classifiers such as random forest to build a base classifier candidate library for Super Learner. Then, we use the ten-fold cross-validation method to exercise the base classifier to improve the base classifier’s robustness. We compute the base classifier’s total loss using the difference between the predicted and actual values and establish a base classifier-weighted optimization model to solve for the optimal weight of the base classifier, which minimizes the weighted total loss of all base classifiers. Thus, we obtain the heterogeneous ensembled Super Learner classifier. Finally, we use three real credit datasets in the UCI database regarding Australia, Japanese, and German and the large credit dataset GMSC published by Kaggle platform to test the ensembled Super Learner model’s effectiveness. We also employ four commonly used evaluation indicators, the accuracy rate, type I error rate, type II error rate, and AUC. Compared with the base classifier’s classification results and heterogeneous models such as Stacking and Bstacking, the results show that the ensembled Super Learner model has higher discrimination accuracy and robustness.

2021-03-29 — Comment: Inference after covariate-adaptive randomisation: aspects of methodology and theory

Authors: Bingkai Wang, Ryoko Susukida, R. Mojtabai, M. Amin-Esmaeili, Michael Rosenblum
Year: 2021
Publication Date: 2021-03-29
Venue: Statistical Theory and Related Fields
DOI: 10.1080/24754269.2021.1905591
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
We thank the editor for the opportunity to write this commentary on the paper by Jun Shao. The author’s paper gives an excellent review of methods developed for statistical inference when considering covariateadaptive, randomised trial designs. We would like to mention how the results from our paper (Wang et al., 2020) fit into those described by Jun Shao. Our paper focused on stratified permuted block randomisation (Zelen, 1974) and also biased coin randomisation (Efron, 1971), which are categorised as Type 1 randomisation schemes in the author’s paper. According to a survey by Lin et al. (2015) on 224 randomised clinical trials published in leading medical journals in 2014, stratified permuted block randomization was used by 70% of trials. Our goal is to improve precision of statistical inference by combining covariate-adaptive design and covariate adjustment, while providing robustness to model misspecification. In Section 6 of the author’s paper, the same goal was discussed and a linear model of potential outcomes given covariates was considered. Our results generalise those given for linearmodel-based estimators to all M-estimators (under regularity conditions), which covers many estimators used to analyse data from randomised clinical trials. Examples of M-estimators include estimators based on logistic regression (Moore & van der Laan, 2009), inverse probability weighting (Robins et al., 1994), the doubly-robust weighted-least-squares estimator (Robins et al., 2007), the augmented inverse probability weighted estimator (Robins et al., 1994; Scharfstein et al., 1999), and targeted maximum likelihood estimators (TMLE) that converge in 1-step (van der Laan&Gruber, 2012). Our results are able to handle covariate adjustment, various outcome types, repeated measures outcomes and missing outcome data under the missing at random assumption. Using data from three completed trials of substance use disorder treatments, we estimated that the precision gained due to stratified permuted block randomisation and covariate adjustment ranged from 1% to 36%. Another contribution of our paper is to prove the consistency and asymptotic normality of the KaplanMeier estimator under stratified randomization. Its asymptotic variance was also derived. We conjecture that this result can be generalised to cover covariate-adjusted estimators for the survival function, such as estimators by Lu and Tsiatis (2011); Zhang (2015).

2021-03-29 — A two-stage super learner for healthcare expenditures

Authors: Ziyue Wu, Seth A. Berkowitz, P. Heagerty, David C. Benkeser
Year: 2021
Publication Date: 2021-03-29
Venue: Health Services & Outcomes Research Methodology
DOI: 10.1007/s10742-022-00275-x
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2021-03-26 — Effect of perinatal depression on risk of adverse infant health outcomes in mother-infant dyads in Gondar town: a causal analysis

Authors: A. F. Dadi, E. Miller, R. Woodman, Telake Azale, Lillian Mwanri
Year: 2021
Publication Date: 2021-03-26
Venue: BMC Pregnancy and Childbirth
DOI: 10.1186/s12884-021-03733-5
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Background Approximately one-third of pregnant and postnatal women in Ethiopia experience depression posing a substantial health burden for these women and their families. Although associations between postnatal depression and worse infant health have been observed, there have been no studies to date assessing the causal effects of perinatal depression on infant health in Ethiopia. We applied longitudinal data and recently developed causal inference methods that reduce the risk of bias to estimate associations between perinatal depression and infant diarrhea, Acute Respiratory Infection (ARI), and malnutrition in Gondar Town, Ethiopia. Methods A cohort of 866 mother-infant dyads were followed from infant birth for 6 months and the cumulative incidence of ARI, diarrhea, and malnutrition were assessed. The Edinburgh Postnatal Depression Scale (EPDS) was used to assess the presence of maternal depression, the Integrated Management of Newborn and Childhood Illnesses (IMNCI) guidelines were used to identify infant ARI and diarrhea, and the mid upper arm circumference (MUAC) was used to identify infant malnutrition. The risk difference (RD) due to maternal depression for each outcome was estimated using targeted maximum likelihood estimation (TMLE), a doubly robust causal inference method used to reduce bias in observational studies. Results The cumulative incidence of diarrhea, ARI and malnutrition during 6-month follow-up was 17.0% (95%CI: 14.5, 19.6), 21.6% (95%CI: 18.89, 24.49), and 14.4% (95%CI: 12.2, 16.9), respectively. There was no association between antenatal depression and ARI (RD = − 1.3%; 95%CI: − 21.0, 18.5), diarrhea (RD = 0.8%; 95%CI: − 9.2, 10.9), or malnutrition (RD = -7.3%; 95%CI: − 22.0, 21.8). Similarly, postnatal depression was not associated with diarrhea (RD = -2.4%; 95%CI: − 9.6, 4.9), ARI (RD = − 3.2%; 95%CI: − 12.4, 5.9), or malnutrition (RD = 0.9%; 95%CI: − 7.6, 9.5). Conclusion There was no evidence for an association between perinatal depression and the risk of infant diarrhea, ARI, and malnutrition amongst women in Gondar Town. Previous reports suggesting increased risks resulting from maternal depression may be due to unobserved confounding.

2021-03-17 — A super learner ensemble of 14 statistical learning models for predicting COVID-19 severity among patients with cardiovascular conditions

Authors: L. Ehwerhemuepha, Sidy Danioko, S. Verma, R. Marano, W. Feaster, S. Taraman, Tatiana Moreno, Jianwei Zheng, Ehsan Yaghmaei, Anthony Chang
Year: 2021
Publication Date: 2021-03-17
Venue: Intelligent Medicine
DOI: 10.1016/j.ibmed.2021.100030
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2021-03-13 — Electronic phenotyping of health outcomes of interest using a linked claims-electronic health record database: Findings from a machine learning pilot project

Authors: T. Gibson, M. Nguyen, Timothy Burrell, Frank Yoon, Jenna Wong, S. Dharmarajan, R. Ouellet-Hellstrom, Wei Hua, Yong Ma, Elande Baro, S. Bloemers, C. Pack, Adee Kennedy, S. Toh, R. Ball
Year: 2021
Publication Date: 2021-03-13
Venue: J. Am. Medical Informatics Assoc.
DOI: 10.1093/jamia/ocab036
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
OBJECTIVE Claims-based algorithms are used in the Food and Drug Administration Sentinel Active Risk Identification and Analysis System to identify occurrences of health outcomes of interest (HOIs) for medical product safety assessment. This project aimed to apply machine learning classification techniques to demonstrate the feasibility of developing a claims-based algorithm to predict an HOI in structured electronic health record (EHR) data. MATERIALS AND METHODS We used the 2015-2019 IBM MarketScan Explorys Claims-EMR Data Set, linking administrative claims and EHR data at the patient level. We focused on a single HOI, rhabdomyolysis, defined by EHR laboratory test results. Using claims-based predictors, we applied machine learning techniques to predict the HOI: logistic regression, LASSO (least absolute shrinkage and selection operator), random forests, support vector machines, artificial neural nets, and an ensemble method (Super Learner). RESULTS The study cohort included 32 956 patients and 39 499 encounters. Model performance (positive predictive value [PPV], sensitivity, specificity, area under the receiver-operating characteristic curve) varied considerably across techniques. The area under the receiver-operating characteristic curve exceeded 0.80 in most model variations. DISCUSSION For the main Food and Drug Administration use case of assessing risk of rhabdomyolysis after drug use, a model with a high PPV is typically preferred. The Super Learner ensemble model without adjustment for class imbalance achieved a PPV of 75.6%, substantially better than a previously used human expert-developed model (PPV = 44.0%). CONCLUSIONS It is feasible to use machine learning methods to predict an EHR-derived HOI with claims-based predictors. Modeling strategies can be adapted for intended uses, including surveillance, identification of cases for chart review, and outcomes research.

2021-03-12 — An Automated Machine Learning-Genetic Algorithm Framework With Active Learning for Design Optimization

Authors: Opeoluwa Owoyele, P. Pal, A. V. Torreira
Year: 2021
Publication Date: 2021-03-12
DOI: 10.1115/1.4050489
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
The use of machine learning (ML)-based surrogate models is a promising technique to significantly accelerate simulation-driven design optimization of internal combustion (IC) engines, due to the high computational cost of running computational fluid dynamics (CFD) simulations. However, training the ML models requires hyperparameter selection, which is often done using trial-and-error and domain expertise. Another challenge is that the data required to train these models are often unknown a priori. In this work, we present an automated hyperparameter selection technique coupled with an active learning approach to address these challenges. The technique presented in this study involves the use of a Bayesian approach to optimize the hyperparameters of the base learners that make up a super learner model. In addition to performing hyperparameter optimization (HPO), an active learning approach is employed, where the process of data generation using simulations, ML training, and surrogate optimization is performed repeatedly to refine the solution in the vicinity of the predicted optimum. The proposed approach is applied to the optimization of a compression ignition engine with control parameters relating to fuel injection, in-cylinder flow, and thermodynamic conditions. It is demonstrated that by automatically selecting the best values of the hyperparameters, a 1.6% improvement in merit value is obtained, compared to an improvement of 1.0% with default hyperparameters. Overall, the framework introduced in this study reduces the need for technical expertise in training ML models for optimization while also reducing the number of simulations needed for performing surrogate-based design optimization.

2021-03-03 — Enhancing accuracy and interpretability of machine learning models using super learning and permutation feature importance techniques in digital soil mapping

Authors: Ruhollah Taghizadeh‐Mehrjardi, N. Hamzehpour, M. Hassanzadeh, K. Schmidt, T. Scholten
Year: 2021
Publication Date: 2021-03-03
DOI: 10.5194/EGUSPHERE-EGU21-9382
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
<p>The digital soil mapping (DSM) approach predicts soil characteristics based on the relationship between soil observations and related covariates using machine learning (ML) models. In this research, we applied a wide range of machine learning models (12 base learners) to predict and map soil characteristics. To enhance accuracy and interpretability we combined the base learner predictions using super learning strategy. However, a major problem of using super learning and complex models is that the explicit share of individual covariates persons in the overall result cannot be explicitly quantified. To overcome this restriction and make the super learning models interpretable, we employed model-agnostic interpretation tools, for example, permutation feature importance. Particularly, we integrated the weight assigned to each ML base learner obtained by super learning and the ranked ML base learner&#8217;s covariates obtained by permutation feature importance to explore the contribution of covariates on the final prediction. We tested our super learning and permutation feature importance techniques to predict and mapping physicochemical soil characteristics of Urmia Playa Lake (UPL) sediments in Iran. As expected, our results indicated that super leaning could significantly improve the ML accuracies for predicting soil characteristics of single base learners. In terms of root mean square error, super learning improved over the performance of the linear regression by an average of 45.7%. Furthermore, the permutation feature importance allowed us to interpret our results better and prove the significant contribution of geomorphological features and groundwater data in predicting soil characteristics of UPL sediments.</p>

2021-02-24 — Enhanced Neural Architecture Search Using Super Learner and Ensemble Approaches

Authors: Séamus Lankford, Diarmuid Grimes
Year: 2021
Publication Date: 2021-02-24
Venue: Asia Service Sciences and Software Engineering Conference
DOI: 10.1145/3456126.3456133
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Neural networks, and in particular Convolutional Neural Networks (CNNs), are often optimized using default parameters. Neural Architecture Search (NAS) enables multiple architectures to be evaluated prior to selection of the optimal architecture. A system integrating open-source tools for Neural Architecture Search (OpenNAS) of image classification problems has been developed and made available to the open-source community. OpenNAS takes any dataset of grayscale, or RGB images, and generates the optimal CNN architecture. The training and optimization of neural networks, using super learner and ensemble approaches, is explored in this research. Particle Swarm Optimization (PSO), Ant Colony Optimization (ACO) and pretrained models serve as base learners for network ensembles. Meta learner algorithms are subsequently applied to these base learners and the ensemble performance on image classification problems is evaluated. Our results show that a stacked generalization ensemble of heterogeneous models is the most effective approach to image classification within OpenNAS.

2021-02-23 — Robust Data Integration Method for Classification of Biomedical Data

Authors: A. Polewko-Klim, Krzysztof Mnich, W. Rudnicki
Year: 2021
Publication Date: 2021-02-23
Venue: Journal of medical systems
DOI: 10.1007/s10916-021-01718-7
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
We present a protocol for integrating two types of biological data – clinical and molecular – for more effective classification of patients with cancer. The proposed approach is a hybrid between early and late data integration strategy. In this hybrid protocol, the set of informative clinical features is extended by the classification results based on molecular data sets. The results are then treated as new synthetic variables. The hybrid protocol was applied to METABRIC breast cancer samples and TCGA urothelial bladder carcinoma samples. Various data types were used for clinical endpoint prediction: clinical data, gene expression, somatic copy number aberrations, RNA-Seq, methylation, and reverse phase protein array. The performance of the hybrid data integration was evaluated with a repeated cross validation procedure and compared with other methods of data integration: early integration and late integration via super learning. The hybrid method gave similar results to those obtained by the best of the tested variants of super learning. What is more, the hybrid method allowed for further sensitivity analysis and recursive feature elimination, which led to compact predictive models for cancer clinical endpoints. For breast cancer, the final model consists of eight clinical variables and two synthetic features obtained from molecular data. For urothelial bladder carcinoma, only two clinical features and one synthetic variable were necessary to build the best predictive model. We have shown that the inclusion of the synthetic variables based on the RNA expression levels and copy number alterations can lead to improved quality of prognostic tests. Thus, it should be considered for inclusion in wider medical practice.

2021-02-21 — Effect modification of general and central obesity by sex and age on cardiovascular outcomes: Targeted maximum likelihood estimation in the atherosclerosis risk in communities study.

Authors: H. Mozafar Saadati, S. Sabour, M. Mansournia, Y. Mehrabi, S. H. Hashemi Nazari
Year: 2021
Publication Date: 2021-02-21
Venue: Diabetes & metabolic syndrome
DOI: 10.1016/j.dsx.2021.02.024
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2021-02-12 — Estimating the Marginal Causal Effect and Potential Impact of Waterpipe Smoking on Multiple Sclerosis Using Targeted Maximum Likelihood Estimation Method: a Large Population-Based Incident Case-Control Study.

Authors: I. Abdollahpour, S. Nedjat, A. Almasi-Hashiani, M. Nazemipour, M. Mansournia, M. Luque-Fernández
Year: 2021
Publication Date: 2021-02-12
Venue: American Journal of Epidemiology
DOI: 10.1093/aje/kwab036
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
There are few if any reports regarding the role of lifetime waterpipe smoking in multiple sclerosis (MS) etiology. The authors investigated the association between waterpipe and MS, adjusted for confounders. This was a population-based incident case-control study conducted in Tehran, Iran. Cases (n=547) were 15-50-year-old patients identified from the Iranian Multiple Sclerosis Society between 2013 and 2015. Population-based controls (n=1057) were 15-50-year old recruited by random digit telephone dialing. A double robust estimator method known as targeted maximum likelihood estimator (TMLE) was used to estimate the marginal risk ratio and odds ratio between waterpipe and MS. The both estimated RR and OR was 1.70 (95% CI: 1.34, 2.17). The population attributable fraction was 21.4% (95% CI: 4.0%, 38.8%). Subject to the limitations of case-control studies in interpreting associations causally, this study suggests that waterpipe use, or its strongly related but undetermined factors, increases the risk of MS. Further epidemiological studies including nested case-control studies are needed to confirm these results.

2021-02-10 — Separating Algorithms from Questions and Causal Inference with Unmeasured Exposures: An Application to Birth Cohort Studies of Early BMI Rebound.

Authors: I. Aris, Aaron L Sarvet, M. Stensrud, R. Neugebauer, Lingjun Li, M. Hivert, E. Oken, Jessica G. Young
Year: 2021
Publication Date: 2021-02-10
Venue: American Journal of Epidemiology
DOI: 10.1093/aje/kwab029
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Observational studies reporting adjusted associations between childhood body mass index (BMI) rebound and subsequent cardio-metabolic outcomes have often not given explicit attention to causal inference, including definition of a target causal effect and assumptions for unbiased estimation of that effect. Using data from 649 children in a Boston, Massachusetts-area cohort recruited in 1999-2002, we considered effects of stochastic interventions on a chosen subset of modifiable, yet unmeasured, exposures expected to be associated with early (< age 4 years) BMI rebound (a proxy) on adolescent cardiometabolic outcomes. We consider assumptions under which these effects may be identified with available data. This leads to an analysis where the proxy, rather than exposure, acts as exposure in the algorithm. We applied Targeted Maximum Likelihood Estimation, a doubly-robust approach that naturally incorporates machine learning for nuisance parameters (e.g. propensity score). We estimated a protective effect of an intervention that assigns modifiable exposures according to the distribution in the observational study of those without (vs. with) early BMI rebound for fat-mass index (-1.39 kg/m2; 95% CI -1.63,-0.72), but weaker or no effects for other cardiometabolic outcomes. Our results clarify distinctions between algorithms and causal questions, encouraging explicit thinking in causal inference with complex exposures.

2021-02-10 — Ensemble machine learning prediction and variable importance analysis of 5-year mortality after cardiac valve and CABG operations

Authors: J. Castela Forte, H. Mungroop, F. D. de Geus, Maureen L van der Grinten, H. Bouma, V. Pettilä, T. Scheeren, M. Nijsten, M. Mariani, I. V. D. van der Horst, R. Henning, M. Wiering, A. Epema
Year: 2021
Publication Date: 2021-02-10
Venue: Scientific Reports
DOI: 10.1038/s41598-021-82403-0
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Despite having a similar post-operative complication profile, cardiac valve operations are associated with a higher mortality rate compared to coronary artery bypass grafting (CABG) operations. For long-term mortality, few predictors are known. In this study, we applied an ensemble machine learning (ML) algorithm to 88 routinely collected peri-operative variables to predict 5-year mortality after different types of cardiac operations. The Super Learner algorithm was trained using prospectively collected peri-operative data from 8241 patients who underwent cardiac valve, CABG and combined operations. Model performance and calibration were determined for all models, and variable importance analysis was conducted for all peri-operative parameters. Results showed that the predictive accuracy was the highest for solitary mitral (0.846 [95% CI 0.812–0.880]) and solitary aortic (0.838 [0.813–0.864]) valve operations, confirming that ensemble ML using routine data collected perioperatively can predict 5-year mortality after cardiac operations with high accuracy. Additionally, post-operative urea was identified as a novel and strong predictor of mortality for several types of operation, having a seemingly additive effect to better known risk factors such as age and postoperative creatinine.

2021-02-05 — Characteristics of HIV seroconverters in the setting of universal test and treat: Results from the SEARCH trial in rural Uganda and Kenya

Authors: Marilyn Nyabuti, M. Petersen, E. Bukusi, M. Kamya, F. Mwangwa, J. Kabami, N. Sang, E. Charlebois, L. Balzer, Joshua Schwab, C. Camlin, D. Black, T. Clark, G. Chamie, D. Havlir, J. Ayieko
Year: 2021
Publication Date: 2021-02-05
Venue: PLoS ONE
DOI: 10.1371/journal.pone.0243167
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Background Additional progress towards HIV epidemic control requires understanding who remains at risk of HIV infection in the context of high uptake of universal testing and treatment (UTT). We sought to characterize seroconverters and risk factors in the SEARCH UTT trial (NCT01864603), which achieved high uptake of universal HIV testing and ART coverage in 32 communities of adults (≥15 years) in rural Uganda and Kenya. Methods In a pooled cohort of 117,114 individuals with baseline HIV negative test results, we described those who seroconverted within 3 years, calculated gender-specific HIV incidence rates, evaluated adjusted risk ratios (aRR) for seroconversion using multivariable targeted maximum likelihood estimation, and assessed potential infection sources based on self-report. Results Of 704 seroconverters, 63% were women. Young (15–24 years) men comprised a larger proportion of seroconverters in Western Uganda (18%) than Eastern Uganda (6%) or Kenya (10%). After adjustment for other risk factors, men who were mobile [≥1 month of prior year living outside community] (aRR:1.68; 95%CI:1.09,2.60) or who HIV tested at home vs. health fair (aRR:2.44; 95%CI:1.89,3.23) were more likely to seroconvert. Women who were aged ≤24 years (aRR:1.91; 95%CI:1.27,2.90), mobile (aRR:1.49; 95%CI:1.04,2.11), or reported a prior HIV test (aRR:1.34; 95%CI:1.06,1.70), or alcohol use (aRR:2.07; 95%CI:1.34,3.22) were more likely to seroconvert. Among survey responders (N = 607, 86%), suspected infection source was more likely for women than men to be ≥10 years older (28% versus 8%) or a spouse (51% vs. 31%) and less likely to be transactional sex (10% versus 16%). Conclusion In the context of universal testing and treatment, additional strategies tailored to regional variability are needed to address HIV infection risks of young women, alcohol users, mobile populations, and those engaged in transactional sex to further reduce HIV incidence rates.

2021-02-02 — Predicting Patient-Reported Outcomes After Hip and Knee Replacement Surgeries: Examples in Super Learning

Authors: M. Maruszczak, Andrew Wilson, A. Shields, J. Vanderpuye-Orgle, D. Erim, Shubhram Pandey, S. Krikov
Year: 2021
Publication Date: 2021-02-02
DOI: 10.26226/morressier.5fce1a9c9e0a135cbed37593
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Abstract unavailable from Semantic Scholar.

2021-01-29 — Performance and Application of Estimators for the Value of an Optimal Dynamic Treatment Rule

Authors: L. Montoya, Jennifer L. Skeem, M. Laan, M. Petersen
Year: 2021
Publication Date: 2021-01-29
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Given an (optimal) dynamic treatment rule, it may be of interest to evaluate that rule – that is, to ask the causal question: what is the expected outcome had every subject received treatment according to that rule? In this paper, we study the performance of estimators that approximate the true value of: 1) an a priori known dynamic treatment rule 2) the true, unknown optimal dynamic treatment rule (ODTR); 3) an estimated ODTR, a so-called “data-adaptive parameter,” whose true value depends on the sample. Using simulations of point-treatment data, we specifically investigate: 1) the impact of increasingly data-adaptive estimation of nuisance parameters and/or of the ODTR on performance; 2) the potential for improved efficiency and bias reduction through the use of semiparametric efficient estimators; and, 3) the importance of sample splitting based on CV-TMLE for accurate inference. In the simulations considered, there was very little cost and many benefits to using the cross-validated targeted maximum likelihood estimator (CV-TMLE) to estimate the value of the true and estimated ODTR; impor-tantly, and in contrast to non cross-validated estimators, the performance of CV-TMLE was maintained even when highly data-adaptive algorithms were used to estimate both nuisance parameters and the ODTR. In addition, we apply these estimators for the value of the rule to the “Interventions” Study, an ongoing randomized controlled trial, to identify whether assigning cognitive behavioral therapy (CBT) to criminal justice-involved adults with mental illness using an ODTR significantly reduces the probability of recidivism, compared to assigning CBT in a non-individualized way.

2021-01-21 — Examining Obedience Training as a Physical Activity Intervention for Dog Owners: Findings from the Stealth Pet Obedience Training (SPOT) Pilot Study

Authors: Katie Potter, B. Masteller, L. Balzer
Year: 2021
Publication Date: 2021-01-21
Venue: International Journal of Environmental Research and Public Health
DOI: 10.3390/ijerph18030902
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Dog training may strengthen the dog–owner bond, a consistent predictor of dog walking behavior. The Stealth Pet Obedience Training (SPOT) study piloted dog training as a stealth physical activity (PA) intervention. In this study, 41 dog owners who reported dog walking ≤3 days/week were randomized to a six-week basic obedience training class or waitlist control. Participants wore accelerometers and logged dog walking at baseline, 6- and 12-weeks. Changes in PA and dog walking were compared between arms with targeted maximum likelihood estimation. At baseline, participants (39 ± 12 years; females = 85%) walked their dog 1.9 days/week and took 5838 steps/day, on average. At week 6, intervention participants walked their dog 0.7 more days/week and took 480 more steps/day, on average, than at baseline, while control participants walked their dog, on average, 0.6 fewer days/week and took 300 fewer steps/day (difference between arms: 1.3 dog walking days/week; 95% CI = 0.2, 2.5; 780 steps/day, 95% CI = −746, 2307). Changes from baseline were similar at week 12 (difference between arms: 1.7 dog walking days/week; 95% CI = 0.6, 2.9; 1084 steps/day, 95% CI = −203, 2370). Given high rates of dog ownership and low rates of dog walking in the United States, this novel PA promotion strategy warrants further investigation.

2021-01-20 — Tuna classification using super learner ensemble of region-based CNN-grouped 2D-LBP models

Authors: J. Jose, C. Kumar, S. Sureshkumar
Year: 2021
Publication Date: 2021-01-20
DOI: 10.1016/J.INPA.2021.01.001
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2021-01-15 — The Impact of Unmeasured Confounding in Observational Studies: a Plasmode Simulation Study of Targeted Maximum Likelihood Estimation

Authors: L. Amusa, T. Zewotir, D. North
Year: 2021
Publication Date: 2021-01-15
DOI: 10.21203/RS.3.RS-145260/V1
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Unmeasured confounding can cause considerable problems in observational studies and may threaten the validity of the estimates of causal treatment effects. There has been discussion on the amount of bias in treatment effect estimates that can occur due to unmeasured confounding. We investigate the robustness of a relatively new causal inference technique, targeted maximum likelihood estimation (TMLE), in terms of its robustness to the impact of unmeasured confounders. We benchmark TMLE’s performance with the inverse probability of treatment weighting (IPW) method. We utilize a plasmode-like simulation based on variables and parameters from the Study to Understand Prognoses and Preferences for Outcomes and Risks of Treatments (SUPPORT). We evaluated the accuracy and precision of the estimated treatment effects. Though TMLE performed better in most of the scenarios considered, our simulation study results suggest that both methods performed reasonably well in estimating the marginal odds ratio, in the presence of unmeasured confounding. Nonetheless, the only remedy to unobserved confounding is controlling for as many as available covariates in an observational study, because not even TMLE can provide safeguard against bias from unmeasured confounders.

2021-01-15 — Higher Order Targeted Maximum Likelihood Estimation

Authors: M. Laan, Ze-Yu Wang, L. Laan
Year: 2021
Publication Date: 2021-01-15
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso, targeted maximum likelihood estimation, tmle

Abstract:
Asymptotic efficiency of targeted maximum likelihood estimators (TMLE) of target features of the data distribution relies on a a second order remainder being asymptotically negligible. In previous work we proposed a nonparametric MLE termed Highly Adaptive Lasso (HAL) which parametrizes the relevant functional of the data distribution in terms of a multivariate real valued cadlag function that is assumed to have finite variation norm. We showed that the HAL-MLE converges in Kullback-Leibler dissimilarity at a rate n-1/3 up till logn factors. Therefore, by using HAL as initial density estimator in the TMLE, the resulting HAL-TMLE is an asymptotically efficient estimator only assuming that the relevant nuisance functions of the data density are cadlag and have finite variation norm. However, in finite samples, the second order remainder can dominate the sampling distribution so that inference based on asymptotic normality would be anti-conservative. In this article we propose a new higher order TMLE, generalizing the regular first order TMLE. We prove that it satisfies an exact linear expansion, in terms of efficient influence functions of sequentially defined higher order fluctuations of the target parameter, with a remainder that is a k+1th order remainder. As a consequence, this k-th order TMLE allows statistical inference only relying on the k+1th order remainder being negligible. We also provide a rationale for the higher order TMLE that it will be superior to the first order TMLE by (iteratively) locally minimizing the exact finite sample remainder of the first order TMLE. The second order TMLE is demonstrated for nonparametric estimation of the integrated squared density and for the treatment specific mean outcome. We also provide an initial simulation study for the second order TMLE of the treatment specific mean confirming the theoretical analysis.

2021-01-15 — Association of Medical Male Circumcision and Sexually Transmitted Infections in a Population-Based Study: A Targeted Maximum Likelihood Estimation Approach

Authors: L. Amusa, T. Zewotir, D. North, A. Kharsany, Lara Lewis
Year: 2021
Publication Date: 2021-01-15
DOI: 10.21203/RS.3.RS-137122/V1
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Background: Epidemiological theory and many empirical studies support the hypothesis that there is a protective effect of male circumcision against some sexually transmitted infections (STIs). However, there is a paucity of randomized control trials (RCTs) to test this hypothesis in the South African population. Due to the infeasibility of conducting RCTs, estimating marginal or average treatment effects with observational data, are of increasing interest. Using targeted maximum likelihood estimation (TMLE), a doubly robust estimation technique, we aim to provide evidence of association between medical male circumcision (MMC) and two STI outcomes.Methods: We investigated the associations between MMC and the two STI outcomes, HIV and HSV-2, using data from the HIV Incidence Provincial Surveillance System (HIPSS) study in KwaZulu-Natal, South Africa. We estimated marginal odds ratios using TMLE and compared estimates with those from propensity score full matching and inverse probability of treatment weighting (IPTW). Results: TMLE estimates suggest that MMC was associated with 46.9% lower odds of HIV (OR: 0.531; 95% CI: 0.455, 0.621) and 20.5% for HSV-2 (OR: 0.795; 95% CI: 0.694, 0.911). The propensity score analyses also provided evidence of association of MMC with lower odds of HIV and HSV-2. For full matching: HIV (OR: 0.546; 95% CI: 0.402, 0.741), and HSV-2 (OR: 0.705; 95% CI: 0.545, 0.910). For IPTW: HIV (OR: 0.541; 95% CI: 0.405, 0.722), and HSV-2 (OR: 0.694; 95% CI: 0.541, 0.889).Conclusion: Using a TMLE approach, we present further evidence of a protective effect of MMC against HIV and HSV-2 in this hyper-endemic South African setting. TMLE has the potential to enhance the evidence base for recommendations that embrace the effect of public health interventions on health or disease outcomes.

2021-01-14 — G-computation and machine learning for estimating the causal effects of binary exposure statuses on binary outcomes

Authors: F. Le Borgne, A. Chatton, M. Léger, R. Lenain, Y. Foucher
Year: 2021
Publication Date: 2021-01-14
Venue: Scientific Reports
DOI: 10.1038/s41598-021-81110-0
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
In clinical research, there is a growing interest in the use of propensity score-based methods to estimate causal effects. G-computation is an alternative because of its high statistical power. Machine learning is also increasingly used because of its possible robustness to model misspecification. In this paper, we aimed to propose an approach that combines machine learning and G-computation when both the outcome and the exposure status are binary and is able to deal with small samples. We evaluated the performances of several methods, including penalized logistic regressions, a neural network, a support vector machine, boosted classification and regression trees, and a super learner through simulations. We proposed six different scenarios characterised by various sample sizes, numbers of covariates and relationships between covariates, exposure statuses, and outcomes. We have also illustrated the application of these methods, in which they were used to estimate the efficacy of barbiturates prescribed during the first 24 h of an episode of intracranial hypertension. In the context of GC, for estimating the individual outcome probabilities in two counterfactual worlds, we reported that the super learner tended to outperform the other approaches in terms of both bias and variance, especially for small sample sizes. The support vector machine performed well, but its mean bias was slightly higher than that of the super learner. In the investigated scenarios, G-computation associated with the super learner was a performant method for drawing causal inferences, even from small sample sizes.

2021-01-11 — Body Mass Index Variable Interpolation to Expand the Utility of Real-world Administrative Healthcare Claims Database Analyses

Authors: Bingcao Wu, W. Chow, Monish Sakthivel, Onkargouda Kakade, Kartikeya Gupta, Debra Israel, Yen-Wen Chen, Aarti Susan Kuruvilla
Year: 2021
Publication Date: 2021-01-11
Venue: Advances in Therapy
DOI: 10.1007/s12325-020-01605-6
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Introduction Administrative claims data provide an important source for real-world evidence (RWE) generation, but incomplete reporting, such as for body mass index (BMI), limits the sample sizes that can be analyzed to address certain research questions. The objective of this study was to construct models by implementing machine-learning (ML) algorithms to predict BMI classifications (≥ 30, ≥ 35, and ≥ 40 kg/m 2 ) in administrative healthcare claims databases, and then internally and externally validate them. Methods Five advanced ML algorithms were implemented for each BMI classification on a random sampling of BMI readings from the Optum PanTher Electronic Health Record database (2%) and the Optum Clinformatics Date of Death (20%) database, while incorporating baseline demographic and clinical characteristics. Sensitivity analyses with oversampling ratios were conducted. Model performance was validated internally and externally. Results Models trained on the Super Learner ML algorithm (SLA) yielded the best BMI classification predictive performance. SLA model 1 utilized sociodemographic and clinical characteristics, including baseline BMI values; the area under the receiver operating characteristic curve (ROC AUC) was approximately 88% for the prediction of BMI classifications of ≥ 30, ≥ 35, and ≥ 40 kg/m 2 (internal validation), while accuracy ranged from 87.9% to 92.8% and specificity ranged from 91.8% to 94.7%. SLA model 2 utilized sociodemographic information and clinical characteristics, excluding baseline BMI values; ROC AUC was approximately 73% for the prediction of BMI classifications of ≥ 30, ≥ 35, and ≥ 40 kg/m 2 (internal validation), while accuracy ranged from 73.6% to 80.0% and specificity ranged from 71.6% to 85.9%. The external validation on the MarketScan Commercial Claims and Encounters database yielded relatively consistent results with slightly diminished performance. Conclusion This study demonstrated the feasibility and validity of using ML algorithms to predict BMI classifications in administrative healthcare claims data to expand the utility for RWE generation.

2021-01-04 — Right population, right resources, right algorithm: Using machine learning efficiently and effectively in surgical systems where data are a limited resource.

Authors: Lauren Eyler Dang, A. Hubbard, F. Dissak-Delon, A. Chichom Mefire, C. Juillard
Year: 2021
Publication Date: 2021-01-04
Venue: Surgery
DOI: 10.1016/j.surg.2020.11.043
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
There is a growing interest in using machine learning algorithms to support surgical care, diagnostics, and public health surveillance in low- and middle-income countries. From our own experience and the literature, we share several lessons for developing such models in settings where the data necessary for algorithm training and implementation is a limited resource. First, the training cohort should be as similar as possible to the population of interest, and recalibration can be used to improve risk estimates when a model is transported to a new context. Second, algorithms should incorporate existing data sources or data that is easily obtainable by frontline health workers or assistants in order to optimize available resources and facilitate integration into clinical practice. Third, the Super Learner ensemble machine learning algorithm can be used to define the optimal model for a given prediction problem while minimizing bias in the algorithm selection process. By considering the right population, right resources, and right algorithm, researchers can train prediction models that are both context-appropriate and resource-conscious. There remain gaps in data availability, affordable computing capacity, and implementation studies that hinder clinical algorithm development and use in low-resource settings, although these barriers are decreasing over time. We advocate for researchers to create open-source code, apps, and training materials to allow new machine learning models to be adapted to different populations and contexts in order to support surgical providers and health care systems in low- and middle-income countries worldwide.

2021-01-02 — Grade estimation by a machine learning model using coordinate rotations

Authors: Gamze Erdogan Erten, M. Yavuz, C. Deutsch
Year: 2021
Publication Date: 2021-01-02
Venue: Applied Earth Science
DOI: 10.1080/25726838.2021.1872822
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Machine learning (ML) models provide useful tools to generate spatial estimations of geological features, but they do not consider the spatial dependence among the observations and they primarily use coordinates as predictors. Thus, many ML models produce visible artifacts in the resulting estimates along the coordinate directions. To overcome this significant problem, this paper presents an ensemble super learner (ESL) model which uses the super learner (SL) model as the ML model. In the ESL model, numerous training sets are created from the original dataset by a coordinate rotation strategy and then the estimates obtained from the fitted SL models are ensembled to produce a final estimate. A dataset from a high-grade gold deposit demonstrates the approach and compares the results to kriging and the SL model. The results demonstrate that the ESL model manages artifacts in ML spatial estimation. It also provides better results than the kriging and SL model in terms of estimation accuracy.

2021 — Three Layer Super Learner Ensemble with Hyperparameter Optimization to Improve the Performance of Machine Learning Model

Authors: K. T. S. Kasthuriarachchi
Year: 2021
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2021 — Super Learner: Stack Generalization Algorithm for AutoML

Authors: Mihir Gada, Zenil Haria, Arnav Mankad, Kaustubh Damania, Smita Sankhe
Year: 2021
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2021 — Super Learner Implementation in Corrosion Rate Prediction

Authors: Joshua O. Ighalo, B. Kaminska
Year: 2021
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2021 — RTM Super Learner Results at Quality Estimation Task

Authors: Ergun Biçici
Year: 2021
Venue: Conference on Machine Translation
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2021 — One-step TMLE to target cause-specific absolute risks and survival curves Tech report

Authors: H. Rytgaard, M. J. Laan
Year: 2021
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract unavailable from Semantic Scholar.

2021 — Machine Learning for Predicting Hospital Acquired Pressure Injuries in ICU Patients: From Explainable AI to Ensemble Super Learners

Authors: J. Alderden, Andrew Wilson, S. Krikov, Jonathan B. Dimas, Ryan Butcher, T. Yap
Year: 2021
Venue: American Medical Informatics Association Annual Symposium
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2021 — Forecasting in a Changing World: from the Great Recession to the COVID-19 Pandemic

Authors: M. Artemova, F. Blasques, S. J. Koopman, Zhaokun Zhang
Year: 2021
Venue: Social Science Research Network
DOI: 10.2139/ssrn.3766336
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
We develop a new targeted maximum likelihood estimation method that provides improved forecasting for misspecified linear autoregressive models. The method weighs data points in the observed sample and is useful in the presence of data generating processes featuring structural breaks, complex nonlinearities, or other time-varying properties which cannot be easily captured by model design. Additionally, the method reduces to classical maximum likelihood when the model is well specified, which results in weights which are set uniformly to one. We show how the optimal weights can be set by means of a cross-validation procedure. In a set of Monte Carlo experiments we reveal that the estimation method can significantly improve the forecasting accuracy of autoregressive models. In an empirical study concerned with forecasting the U.S. Industrial Production, we show that the forecast accuracy during the Great Recession can be significantly improved by giving greater weight to observations associated with past recessions. We further establish that the same empirical finding can be found for the 2008-2009 global financial crisis, for different macroeconomic time series, and for the COVID-19 recession in 2020.

2021 — Enhancing the accuracy of machine learning models using the super learner technique in digital soil mapping

Authors: Ruhollah Taghizadeh‐Mehrjardi, N. Hamzehpour, M. Hassanzadeh, Brandon Heung, Maryam Ghebleh Goydaragh, K. Schmidt, T. Scholten
Year: 2021
DOI: 10.1016/J.GEODERMA.2021.115108
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2020 (59 papers)
2020-12-10 — The association of maternal psychosocial stress with newborn telomere length

Authors: M. Izano, L. Cushing, Jue Lin, S. Eick, Dana E. Goin, E. Epel, T. Woodruff, R. Morello-Frosch
Year: 2020
Publication Date: 2020-12-10
Venue: PLoS ONE
DOI: 10.1371/journal.pone.0242064
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation, tmle

Abstract:
Background Telomere length in early life predicts later length, and shortened telomere length among adults and children has been linked to increased risk of chronic disease and mortality. Maternal stress during pregnancy may impact telomere length of the newborn. Methods In a diverse cohort of 355 pregnant women receiving prenatal and delivery care services at two hospitals in San Francisco, California, we investigated the relationship between self-reported maternal psychosocial stressors during the 2nd trimester of pregnancy and telomere length (T/S ratio) in newborn umbilical cord blood leukocytes. We examined financial strain, food insecurity, high job strain, poor neighborhood quality, low standing in one’s community, experience of stressful/traumatic life events, caregiving for a dependent family member, perceived stress, and unplanned pregnancy. We used linear regression and Targeted Minimum Loss-Based Estimation (TMLE) to evaluate the change in the T/S ratio associated with exposure to each stressor controlling for maternal age, education, parity, race/ethnicity, and delivery hospital. Results In TMLE analyses, low community standing (-0.09; 95% confidence interval [CI]-0.19 to 0.00) and perceived stress (-0.07; 95% CI -0.15 to 0.021 was marginally associated with shorter newborn telomere length, but the associations were not significant after adjusting for multiple comparisons. All linear regression estimates were not statistically significant. Our results also suggest that the association between some maternal stressors and newborn telomere length varies by race/ethnicity and infant sex. Conclusions This study is the first to examine the joint effect of multiple stressors during pregnancy on newborn TL using a flexible modeling approach.

2020-12-07 — MineCap: Detecção de Mineração de Criptomoedas em Redes Corporativas com Aprendizado de Máquina e Prevenção de Abusos com Redes Definidas por Software

Authors: H. N. C. Neto, N. C. Fernandes, Diogo M. F. Mattos
Year: 2020
Publication Date: 2020-12-07
Venue: SBRC Companion
DOI: 10.5753/SBRC_ESTENDIDO.2020.12408
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
A mineração não autorizada de criptomoedas implica o uso de valiosos recursos de computação e o alto consumo de energia. Este trabalho propõe o mecanismo MineCap, um mecanismo dinâmico e em linha para detectar e bloquear fluxos de mineração não autorizada de criptomoedas, usando o aprendizado de máquina e redes definidas por software. O MineCap desenvolve a técnica de super aprendizado incremental, uma variante do super learner aplicada ao aprendizado incremental. O super aprendizado incremental proporciona ao MineCap precisão para classificar os fluxos de mineração ao passo que o mecanismo aprende continuamente com os dados recebidos. Os resultados revelam que o mecanismo alcança 98% de acurácia, 99% de precisão, 97% de sensibilidade e 99,9% de especificidade e evita problemas relacionados ao desvio de conceito. Os resultados desse trabalho foram submetidos e aceitos em um congresso internacional, um congresso nacional, um minicurso, uma revista indexada e, ainda, há um artigo em processo de revisão em uma revista.

2020-12-01 — Long-term effects of asthma medication on asthma symptoms: an application of the targeted maximum likelihood estimation

Authors: Carolin Veit, R. Herrera, G. Weinmayr, J. Genuneit, D. Windstetter, C. Vogelberg, E. von Mutius, D. Nowak, K. Radon, Jessica Gerlich, T. Weinmann
Year: 2020
Publication Date: 2020-12-01
Venue: BMC Medical Research Methodology
DOI: 10.1186/s12874-020-01175-9
Link: Semantic Scholar
Matched Keywords: super learner, targeted maximum likelihood estimation

Abstract:
Background Long-term effectiveness of asthma control medication has been shown in clinical trials but results from observational studies with children and adolescents are lacking. Marginal structural models estimated using targeted maximum likelihood methods are a novel statistiscal approach for such studies as it allows to account for time-varying confounders and time-varying treatment. Therefore, we aimed to calculate the long-term risk of reporting asthma symptoms in relation to control medication use in a real-life setting from childhood to adulthood applying targeted maximum likelihood estimation. Methods In the prospective cohort study SOLAR (Study on Occupational Allergy Risks) we followed a German subsample of 121 asthmatic children (9–11 years old) of the ISAAC II cohort (International Study of Asthma and Allergies in Childhood) until the age of 19 to 24. We obtained self-reported questionnaire data on asthma control medication use at baseline (1995–1996) and first follow-up (2002–2003) as well as self-reported asthma symptoms at baseline, first and second follow-up (2007–2009). Three hypothetical treatment scenarios were defined: early sustained intervention, early unsustained intervention and no treatment at all. We performed longitudinal targeted maximum likelihood estimation combined with Super Learner algorithm to estimate the relative risk (RR) to report asthma symptoms at SOLAR I and SOLAR II in relation to the different hypothetical scenarios. Results A hypothetical intervention of early sustained treatment was associated with a statistically significant risk increment of asthma symptoms at second follow-up when compared to no treatment at all (RR: 1.51, 95% CI: 1.19–1.83) or early unsustained intervention (RR:1.38, 95% CI: 1.11–1.65). Conclusions While we could confirm the tagerted maximum likelihood estimation to be a usable and robust statistical tool, we did not observe a beneficial effect of asthma control medication on asthma symptoms. Because of potential due to the small sample size, lack of data on disease severity and reverse causation our results should, however, be interpreted with caution.

2020-11-04 — An Automated Machine Learning-Genetic Algorithm (AutoML-GA) Framework With Active Learning for Design Optimization

Authors: Opeoluwa Owoyele, P. Pal, A. V. Torreira
Year: 2020
Publication Date: 2020-11-04
Venue: ASME 2020 Internal Combustion Engine Division Fall Technical Conference
DOI: 10.1115/icef2020-3000
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
The use of machine learning (ML) based surrogate models is a promising technique to significantly accelerate simulation-based design optimization of IC engines, due to the high computational cost of running computational fluid dynamics (CFD) simulations. However, surrogate-based optimization for IC engine applications suffers from two main issues. First, training ML models requires hyperparameter selection, often involving trial-and-error combined with domain expertise. The second issue is that the data required to train these models is often unknown a priori. In this work, we present an automated hyperparameter selection technique coupled with an active learning approach to address these challenges. The technique presented in this study involves the use of a Bayesian approach to optimize the hyperparameters of the base learners that make up a Super Learner model to obtain better test performance. In addition to performing hyperparameter optimization (HPO), an active learning approach is employed, where the process of data generation using simulations, ML training, and surrogate optimization, is performed repeatedly to refine the solution in the vicinity of the predicted optimum. The proposed approach is applied to the optimization of a compression ignition engine with control parameters relating to fuel injection, in-cylinder flow, and thermodynamic conditions. It is demonstrated that by automatically selecting the best values of the hyperparameters, a 1.6% improvement in merit value is obtained, compared to an improvement of 1.0% with default hyperparameters. Overall, the framework introduced in this study reduces the need for technical expertise in training ML models for optimization, while also reducing the number of simulations needed for performing surrogate-based design optimization.

2020-11-01 — Super Learning with Repeated Cross Validation

Authors: Krzysztof Mnich, A. Polewko-Klim, A. Golinska, W. Lesiński, W. Rudnicki
Year: 2020
Publication Date: 2020-11-01
Venue: 2020 International Conference on Data Mining Workshops (ICDMW)
DOI: 10.1109/ICDMW51313.2020.00089
Link: Semantic Scholar
Matched Keywords: super learner, super learning

Abstract:
Super learner algorithm was created to combine results of multiple base learners with the use of cross validation. However, in many cases it does not outperform significantly a simple average of the base results. We propose to apply multiple repeats of cross validation to improve the performance of super learning. Two approaches to application of repeated cross validation were tested on artificial data sets and on real-life, biomedical data sets. One of the approaches, MEAN OUTPUT strategy, proved to significantly improve the results. To reduce the computational complexity of the algorithm, we suggest the use of 3-fold, rather than the previously recommended 10-fold validation. The tests showed, that this simplification does not affect the super learning results.

2020-11-01 — Dynamic survival prediction combining landmarking with a machine learning ensemble: Methodology and empirical comparison

Authors: K. Tanner, L. Sharples, R. Daniel, R. Keogh
Year: 2020
Publication Date: 2020-11-01
Venue: Journal of the Royal Statistical Society: Series A (Statistics in Society)
DOI: 10.1111/rssa.12611
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Dynamic prediction models provide predicted survival probabilities that can be updated over time for an individual as new measurements become available. Two techniques for dynamic survival prediction with longitudinal data dominate the statistical literature: joint modelling and landmarking. There is substantial interest in the use of machine learning methods for prediction; however, their use in the context of dynamic survival prediction has been limited. We show how landmarking can be combined with a machine learning ensemble—the Super Learner. The ensemble combines predictions from different machine learning and statistical algorithms with the goal of achieving improved performance. The proposed approach exploits discrete time survival analysis techniques to enable the use of machine learning algorithms for binary outcomes. We discuss practical and statistical considerations involved in implementing the ensemble. The methods are illustrated and compared using longitudinal data from the UK Cystic Fibrosis Registry. Standard landmarking and the landmark Super Learner approach resulted in similar cross‐validated predictive performance, in this case, outperforming joint modelling.

2020-10-26 — Redlines and Greenspace: The Relationship between Historical Redlining and 2010 Greenspace across the United States

Authors: A. Nardone, K. Rudolph, R. Morello-Frosch, J. Casey
Year: 2020
Publication Date: 2020-10-26
Venue: Environmental Health Perspectives
DOI: 10.1289/EHP7495
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Introduction: Redlining, a racist mortgage appraisal practice of the 1930s, established and exacerbated racial residential segregation boundaries in the United States. Investment risk grades assigned >80y ago through security maps from the Home Owners’ Loan Corporation (HOLC) are associated with current sociodemographics and adverse health outcomes. We assessed whether historical HOLC investment grades are associated with 2010 greenspace, a health-promoting neighborhood resource. Objectives: We compared 2010 normalized difference vegetation index (NDVI) across previous HOLC neighborhood grades using propensity score restriction and matching. Methods: Security map shapefiles were downloaded from the Mapping Inequality Project. Neighborhood investment risk grades included A (best, green), B (blue), C (yellow), and D (hazardous, red, i.e., redlined). We used 2010 satellite imagery to calculate the average NDVI for each HOLC neighborhood. Our main outcomes were 2010 annual average NDVI and summer NDVI. We assigned areal-apportioned 1940 census measures to each HOLC neighborhood. We used propensity score restriction, matching, and targeted maximum likelihood estimation to limit model extrapolation, reduce confounding, and estimate the association between HOLC grade and NDVI for the following comparisons: Grades B vs. A, C vs. B, and D vs. C. Results: Across 102 urban areas (4,141 HOLC polygons), annual average ±standard deviation (SD) 2010 NDVI was 0.47 (±0.09), 0.43 (±0.09), 0.39 (±0.09), and 0.36 (±0.10) in Grades A–D, respectively. In analyses adjusted for current ecoregion and census region, 1940s census measures, and 1940s population density, annual average NDVI values in 2010 were estimated at −0.039 (95% CI: −0.045, −0.034), −0.024 (95% CI: −0.030, −0.018), and −0.026 (95% CI: −0.037, −0.015) for Grades B vs. A, C vs. B, and D vs. C, respectively, in the 1930s. Discussion: Estimates adjusted for historical characteristics indicate that neighborhoods assigned worse HOLC grades in the 1930s are associated with reduced present-day greenspace. https://doi.org/10.1289/EHP7495

2020-10-14 — Schistosoma mansoni infection is associated with a higher probability of tuberculosis disease in HIV-infected adults in Kenya.

Authors: Taryn A McLaughlin, A. Nizam, Felix Odhiambo Hayara, G. Ouma, A. Campbell, Jeremiah Khayumbi, Joshua Ongalo, S. Ouma, N. Shah, J. Altman, D. Kaushal, Jyothi Rengarajan, J. Ernst, H. Blumberg, L. Waller, N. Gandhi, C. Day, D. Benkeser
Year: 2020
Publication Date: 2020-10-14
Venue: Journal of Acquired Immune Deficiency Syndromes
DOI: 10.1097/QAI.0000000000002536
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
BACKGROUND Helminth infections can modulate immunity to Mycobacterium tuberculosis (Mtb). However, the effect of helminths, including Schistosoma mansoni (SM), on Mtb infection outcomes is less clear. Furthermore, HIV is a known risk factor for tuberculosis (TB) disease and has been implicated in SM pathogenesis. Therefore, it is important to evaluate whether HIV modifies the association between SM and Mtb infection. SETTING HIV-infected and HIV-uninfected adults were enrolled in Kisumu County, Kenya between 2014 and 2017 and categorized into three groups based on Mtb infection status: Mtb-uninfected healthy controls (HC), latent TB infection (LTBI), and active TB disease. Participants were subsequently evaluated for infection with SM. METHODS We used targeted minimum loss estimation and super learning to estimate a covariate-adjusted association between SM and Mtb infection outcomes, defined as the probability of being HC, LTBI or TB. HIV status was evaluated as an effect modifier of this association. RESULTS SM was not associated with differences in baseline demographic or clinical features of participants in this study, nor with additional parasitic infections. Covariate-adjusted analyses indicated that infection with SM was associated with a 4% higher estimated proportion of active TB cases in HIV-uninfected individuals and a 14% higher estimated proportion of active TB cases in HIV-infected individuals. There were no differences in estimated proportions of LTBI cases. CONCLUSIONS We provide evidence that SM infection is associated with a higher probability of active TB disease, particularly in HIV-infected individuals.

2020-10-01 — Correcting the Hallux Valgus Deformity: A Comparison Between Modified Lapidus Procedure and Scarf Osteotomy

Authors: M. Reilly, J. Day, A. MacMahon, Kristin C. Caolo, Bopha Chrea, Nicholas Williams, M. Drakos, S. Ellis
Year: 2020
Publication Date: 2020-10-01
Venue: Foot & Ankle Orthopaedics
DOI: 10.1177/2473011420s00400
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Category: Bunion; Midfoot/Forefoot Introduction/Purpose: Lapidus procedure and Scarf osteotomy are indicated for treatment of mild to moderate hallux valgus. Advantages of modified Lapidus procedure include ability to address severe deformity, first tarsometatarsal arthritis, and first ray hypermobility. Advantages of Scarf osteotomy include greater correction of the distal metatarsal articular angle (DMAA) and greater fixation stability than other techniques. Both procedures have shown good radiographic and clinical outcomes; however, no prior studies have compared these outcomes between the procedures. The aim of this study was to compare clinical and radiographic outcomes between patients with hallux valgus treated with the modified Lapidus procedure or Scarf osteotomy. Methods: This retrospective cohort study included patients treated by one of seven fellowship-trained foot and ankle surgeons were identified. Inclusion criteria were age greater than 18 years, primary modified Lapidus procedure or Scarf osteotomy for hallux valgus, minimum 1-year postoperative PROMIS scores, and minimum 3-month postoperative radiographs. Revision cases were excluded. Clinical outcomes were assessed using six PROMIS domains: Pain Interference, Pain Intensity, Physical Function, Global Mental Health, Global Physical Health, and Depression. Pre- and postoperative radiographic parameters were measured on AP (HVA, IMA, DMAA, tibial sesamoid position), and lateral (talo-1st-metatarsal angle (Meary’s), Horton index, Seiberg index, sagittal IMA) x-rays. Statistical analysis utilized targeted maximum likelihood estimation controls for confounding of bunion severity by including covariates for baseline HVA and IMA. Statistics were also analyzed in a restricted cohort of mild to moderate severity bunions (HVA<40 and IMA<16; n=57 each). Complications including repeat surgeries, recurrence of deformity, and malunion/nonunion were recorded. Results: 136 patients (73 Lapidus, 63 Scarf) with average 17.8 month follow-up constituted our study. Both groups demonstrated significant improvement in Global Physical Health, Global Mental Health, and Physical Function, with patients in the Lapidus group showing a significantly greater improvement of 3.6 points (p=0.01) compared to Scarf. After controlling for bunion severity, the probability of having normal postoperative IMA (<10 ) was 17% lower (p<0.001) with Scarf compared to Lapidus. This finding was consistent in the restricted cohort of mild to moderate severity bunions. Lapidus group demonstrated significantly greater correction in Meary’s angle, Seiberg index, and sagittal IMA. Complications in the Lapidus group included one nonunion, three symptomatic implants, two hallux varus. The Scarf group had one reoperative cheilectomy and one second metatarsal stress fracture. Conclusion: This is the first study to compare both radiographic and patient-reported outcomes between Lapidus procedure and Scarf osteotomy for correction of hallux valgus deformity. While both procedures yielded improvements in outcomes, results suggest that the probability of having a normal postoperative IMA is greater with Lapidus procedure, even when adjusted for severity of deformity. In addition, greater correction reflected in sagittal measurements may further support the role of rotational correction in the Lapidus procedure.

2020-09-26 — hal9001: Scalable highly adaptive lasso regression in R

Authors: N. Hejazi, J. Coyle, M. Laan
Year: 2020
Publication Date: 2020-09-26
Venue: Journal of Open Source Software
DOI: 10.21105/JOSS.02526
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso

Abstract:
The hal9001 R package provides a computationally efficient implementation of the highly adaptive lasso (HAL), a flexible nonparametric regression and machine learning algorithm endowed with several theoretically convenient properties. hal9001 pairs an implementation of this estimator with an array of practical variable selection tools and sensible defaults in order to improve the scalability of the algorithm. By building on existing R packages for lasso regression and leveraging compiled code in key internal functions, the hal9001 R package provides a family of highly adaptive lasso estimators suitable for use in both modern large-scale data analysis and cutting-edge research efforts at the intersection of statistics and machine learning, including the emerging subfield of computational causal inference (Wong, 2020).

2020-09-23 — Intercontinental prediction of soybean phenology via hybrid ensemble of knowledge-based and data-driven models

Authors: Ryan F. McCormick, S. Truong, J. Rotundo, Adam P. Gaspar, D. Kyle, F. V. van Eeuwijk, C. Messina
Year: 2020
Publication Date: 2020-09-23
Venue: bioRxiv
DOI: 10.1101/2020.09.22.306506
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
The timing of crop development has significant impacts on management decisions and subsequent yield formation. A large intercontinental dataset recording the timing of soybean developmental stages was used to establish ensembling approaches that leverage both discrete-time dynamical system models of soybean phenology and data-driven, machine-learned models to achieve accurate and interpretable predictions. We demonstrate that the knowledge-based, dynamical models can improve machine learning by generating expert-engineered features. Combining the predictions of the diverse component models via super learning resulted in a mean absolute error of 4.12 and 4.55 days to flowering (R1) and physiological maturity (R7), providing an improvement relative to the best benchmark model error of 6.90 and 15.47 days, respectively. The hybrid intercontinental model applies to a much wider range of management and temperature conditions than previous mechanistic models, enabling improved decision support as alternative cropping systems arise, farm sizes increase, and changes in the global climate continue to accelerate.

2020-09-16 — Highly Adaptive Lasso Conditional Density Estimation [R package haldensify version 0.0.6]

Authors: N. Hejazi, David C. Benkeser, M. Laan
Year: 2020
Publication Date: 2020-09-16
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso

Abstract:
Abstract unavailable from Semantic Scholar.

2020-09-04 — Carbon-Deficient Titanium Carbide With Highly Enhanced Hardness

Authors: Hui Li, Shuailing Ma, Lixue Chen, Zhuo-Liang Yu
Year: 2020
Publication Date: 2020-09-04
Venue: Frontiers of Physics
DOI: 10.3389/fphy.2020.00364
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
We report the synthesis of a polycrystalline specimen of TiC1−x under high-pressure and high-temperature (HPHT) conditions. The carbon vacancy, crystal structure, Vicker hardness, elastic constants, and bond features of the synthesized specimen were investigated. Though the specimens were synthesized with stoichiometric ratio at high pressure, a robust carbon vacancy was observed using energy dispersive and X-ray photoelectron spectrum. TiC1−x exhibits almost the highest asymptotic Vickers hardness in transition-metal light-element (TMLE) compounds. In this study, using Vickers hardness characterization, the asymptotic hardness was found to be 27.1 GPa. This exceeds the hardness of most transition metal borides with high boron concentrations. Based on the first-principles calculation of the Mulliken population of Ti-C bonds, the intrinsic high Vickers hardness of TiC1−x is attributed to the combination of covalent Ti-C bonds and the optimized eight-valence-electron structure, while the extrinsic contribution comes from the harden effect of carbon defects. This work demonstrates that a higher concentration of light elements or a higher-dimensional light element framework is not the critical factor for higher hardness, and carbon vacancy is another way to strengthen the crystal structure.

2020-09-01 — Impact of discretization of the timeline for longitudinal causal inference methods

Authors: Steve Ferreira Guerra, M. Schnitzer, A. Forget, L. Blais
Year: 2020
Publication Date: 2020-09-01
Venue: Statistics in Medicine
DOI: 10.1002/sim.8710
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
In longitudinal settings, causal inference methods usually rely on a discretization of the patient timeline that may not reflect the underlying data generation process. This article investigates the estimation of causal parameters under discretized data. It presents the implicit assumptions practitioners make but do not acknowledge when discretizing data to assess longitudinal causal parameters. We illustrate that differences in point estimates under different discretizations are due to the data coarsening resulting in both a modified definition of the parameter of interest and loss of information about time‐dependent confounders. We further investigate several tools to advise analysts in selecting a timeline discretization for use with pooled longitudinal targeted maximum likelihood estimation for the estimation of the parameters of a marginal structural model. We use a simulation study to empirically evaluate bias at different discretizations and assess the use of the cross‐validated variance as a measure of data support to select a discretization under a chosen data coarsening mechanism. We then apply our approach to a study on the relative effect of alternative asthma treatments during pregnancy on pregnancy duration. The results of the simulation study illustrate how coarsening changes the target parameter of interest as well as how it may create bias due to a lack of appropriate control for time‐dependent confounders. We also observe evidence that the cross‐validated variance acts well as a measure of support in the data, by being minimized at finer discretizations as the sample size increases.

2020-09-01 — Comparison of machine learning methods for predicting viral failure: a case study using electronic health record data

Authors: Allan Kimaina, Jonathan Dick, A. Delong, S. Chrysanthopoulou, R. Kantor, J. Hogan
Year: 2020
Publication Date: 2020-09-01
Venue: Statistical Communications in Infectious Diseases
DOI: 10.1515/scid-2019-0017
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract Background Human immunodeficiency virus (HIV) viral failure occurs when antiretroviral therapy fails to suppress and sustain a person’s viral load count below 1,000 copies of viral ribonucleic acid per milliliter. For those newly diagnosed with HIV and living in a setting where healthcare resources are limited, such as a low- and middle-income country, the World Health Organization recommends viral load monitoring six months after initiation of antiretroviral treatment and yearly thereafter. Deviations from this schedule are made in cases where viral failure occurs or at the discretion of the clinician. Failure to detect viral failure in a timely fashion can lead to delayed administration of essential interventions. Clinical prediction models based on information available in the patient medical record are increasingly being developed and deployed for decision support in clinical medicine and public health. This raises the possibility that prediction models can be used to detect potential for viral failure in advance of viral measurements, particularly when those measurements occur infrequently. Objective Our goal is to use electronic health record data from a large HIV care program in Kenya to characterize and compare the predictive accuracy of several statistical machine learning methods for predicting viral failure at the first and second measurements following initiation of antiretroviral therapy. Predictive accuracy is measured in terms of sensitivity, specificity and area under the receiver-operator characteristic curve. Methods We trained and cross-validated 10 statistical machine learning models and algorithms on data from over 10,000 patients in the Academic Model Providing Access to Healthcare care program in western Kenya. These included parametric, non-parametric, ensemble, and Bayesian methods. The input variables included 50 items from the clinical record, hand picked in consultation with clinician experts. Predictive accuracy measures were calculated using 10-fold cross validation. Results Viral load failure rate is about 20% in this patient cohort at both the first and second measurements. Ensemble techniques generally outperformed other methods. For predicting viral failure at the first follow up measure, specificity was over 90% for these methods, but sensitivity was typically in the 50–60% range. Predictive accuracy was greater for the second follow up measure, with sensitivities over 80%. Super Learner, gradient boosting and Bayesian additive regression trees consistently outperformed other methods. For a viral failure rate of 20%, the positive predictive value for the top-performing methods is between 75 and 85%, while the negative predictive value is over 95%. Conclusion Evidence from this study suggests that machine learning techniques have potential to identify patients at risk for viral failure prior to their scheduled measurements. Ultimately, prognostic virologic assessment can help guide the administration of earlier targeted intervention such as enhanced drug resistance monitoring, rigorous adherence counseling, or appropriate next-line therapy switching. External validation studies should be used to confirm the results found here.

2020-08-24 — WebShell Attack Detection Based on a Deep Super Learner

Authors: Zhuang Ai, Nurbol Luktarhan, AiJun Zhou, Dan Lv
Year: 2020
Publication Date: 2020-08-24
Venue: Symmetry
DOI: 10.3390/sym12091406
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
WebShell is a common network backdoor attack that is characterized by high concealment and great harm. However, conventional WebShell detection methods can no longer cope with complex and flexible variations of WebShell attacks. Therefore, this paper proposes a deep super learner for attack detection. First, the collected data are deduplicated to prevent the influence of duplicate data on the result. Second, to detect the results of the algorithm, static and dynamic feature are taken as the feature of the algorithm to construct a comprehensive feature set. We then use the Word2Vec algorithm to vectorize the features. During this period, to prevent the outbreak of the number of features, we use a genetic algorithm to extract the validity of the feature dimension. Finally, we use a deep super learner to detect WebShell. The experimental results show that this algorithm can effectively detect WebShell, and its accuracy and recall are greatly improved.

2020-08-18 — Estimating the effects of body mass index and central obesity on stroke in diabetics and non‐diabetics using targeted maximum likelihood estimation: Atherosclerosis Risk in Communities study

Authors: H. Mozafar Saadati, Y. Mehrabi, S. Sabour, M. Mansournia, Seyed Saeed Hashemi Nazari
Year: 2020
Publication Date: 2020-08-18
Venue: Obesity Science & Practice
DOI: 10.1002/osp4.447
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
The association of body mass index (BMI) with the risk of cardiovascular disease among diabetic patients is controversial. This study compared the effects of BMI and central obesity on stroke in diabetics and non‐diabetics using targeted maximum likelihood estimation.

2020-08-10 — Nonparametric bootstrap inference for the targeted highly adaptive least absolute shrinkage and selection operator (LASSO) estimator

Authors: Weixin Cai, M. J. van der Laan
Year: 2020
Publication Date: 2020-08-10
Venue: The International Journal of Biostatistics
DOI: 10.1515/ijb-2017-0070
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract The Highly-Adaptive least absolute shrinkage and selection operator (LASSO) Targeted Minimum Loss Estimator (HAL-TMLE) is an efficient plug-in estimator of a pathwise differentiable parameter in a statistical model that at minimal (and possibly only) assumes that the sectional variation norm of the true nuisance functions (i.e., relevant part of data distribution) are finite. It relies on an initial estimator (HAL-MLE) of the nuisance functions by minimizing the empirical risk over the parameter space under the constraint that the sectional variation norm of the candidate functions are bounded by a constant, where this constant can be selected with cross-validation. In this article we establish that the nonparametric bootstrap for the HAL-TMLE, fixing the value of the sectional variation norm at a value larger or equal than the cross-validation selector, provides a consistent method for estimating the normal limit distribution of the HAL-TMLE. In order to optimize the finite sample coverage of the nonparametric bootstrap confidence intervals, we propose a selection method for this sectional variation norm that is based on running the nonparametric bootstrap for all values of the sectional variation norm larger than the one selected by cross-validation, and subsequently determining a value at which the width of the resulting confidence intervals reaches a plateau. We demonstrate our method for 1) nonparametric estimation of the average treatment effect when observing a covariate vector, binary treatment, and outcome, and for 2) nonparametric estimation of the integral of the square of the multivariate density of the data distribution. In addition, we also present simulation results for these two examples demonstrating the excellent finite sample coverage of bootstrap-based confidence intervals.

2020-08-05 — The obesity paradox in critically ill patients: a causal learning approach to a casual finding

Authors: Alexander Decruyenaere, Johan Steen, K. Colpaert, D. Benoit, J. Decruyenaere, S. Vansteelandt
Year: 2020
Publication Date: 2020-08-05
Venue: Critical Care
DOI: 10.1186/s13054-020-03199-5
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Background While obesity confers an increased risk of death in the general population, numerous studies have reported an association between obesity and improved survival among critically ill patients. This contrary finding has been referred to as the obesity paradox. In this retrospective study, two causal inference approaches were used to address whether the survival of non-obese critically ill patients would have been improved if they had been obese. Methods The study cohort comprised 6557 adult critically ill patients hospitalized at the Intensive Care Unit of the Ghent University Hospital between 2015 and 2017. Obesity was defined as a body mass index of ≥ 30 kg/m 2 . Two causal inference approaches were used to estimate the average effect of obesity in the non-obese (AON): a traditional approach that used regression adjustment for confounding and that assumed missingness completely at random and a robust approach that used machine learning within the targeted maximum likelihood estimation framework along with multiple imputation of missing values under the assumption of missingness at random. 1754 (26.8%) patients were discarded in the traditional approach because of at least one missing value for obesity status or confounders. Results Obesity was present in 18.9% of patients. The in-hospital mortality was 14.6% in non-obese patients and 13.5% in obese patients. The raw marginal risk difference for in-hospital mortality between obese and non-obese patients was − 1.06% (95% confidence interval (CI) − 3.23 to 1.11%, P = 0.337). The traditional approach resulted in an AON of − 2.48% (95% CI − 4.80 to − 0.15%, P = 0.037), whereas the robust approach yielded an AON of − 0.59% (95% CI − 2.77 to 1.60%, P = 0.599). Conclusions A causal inference approach that is robust to residual confounding bias due to model misspecification and selection bias due to missing (at random) data mitigates the obesity paradox observed in critically ill patients, whereas a traditional approach results in even more paradoxical findings. The robust approach does not provide evidence that the survival of non-obese critically ill patients would have been improved if they had been obese.

2020-08-01 — Comment: Stabilizing the Doubly-Robust Estimators of the Average Treatment Effect under Positivity Violations

Authors: Fan Li
Year: 2020
Publication Date: 2020-08-01
DOI: 10.1214/20-sts774
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Doubly-robust estimators within the one-step and TMLE frameworks could exhibit finite-sample bias and excess variability under positivity violations. We comment on how the application of a stabilization factor may improve the efficiency property of one-step estimator and TMLE, and the comparisons with their collaborative counterparts using the adaptive propensity scores.

2020-07-02 — Multi-resolution super learner for voxel-wise classification of prostate cancer using multi-parametric MRI

Authors: Jin Jin, Lin Zhang, E. Leng, G. Metzger, J. Koopmeiners
Year: 2020
Publication Date: 2020-07-02
Venue: Journal of Applied Statistics
DOI: 10.1080/02664763.2021.2017411
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Multi-parametric MRI (mpMRI) is a critical tool in prostate cancer (PCa) diagnosis and management. To further advance the use of mpMRI in patient care, computer aided diagnostic methods are under continuous development for supporting/supplanting standard radiological interpretation. While voxel-wise PCa classification models are the gold standard, few if any approaches have incorporated the inherent structure of the mpMRI data, such as spatial heterogeneity and between-voxel correlation, into PCa classification. We propose a machine learning-based method to fill in this gap. Our method uses an ensemble learning approach to capture regional heterogeneity in the data, where classifiers are developed at multiple resolutions and combined using the super learner algorithm, and further account for between-voxel correlation through a Gaussian kernel smoother. It allows any type of classifier to be the base learner and can be extended to further classify PCa sub-categories. We introduce the algorithms for binary PCa classification, as well as for classifying the ordinal clinical significance of PCa for which a weighted likelihood approach is implemented to improve the detection of less prevalent cancer categories. The proposed method has shown important advantages over conventional modeling and machine learning approaches in simulations and application to our motivating patient data.

2020-06-29 — Development and validation of a Super learner-based model for predicting survival in Chinese Han patients with resected colorectal cancer.

Authors: Jiqing Li, J. Gu, Yuan Lu, Xiaoqing Wang, S. Si, F. Xue
Year: 2020
Publication Date: 2020-06-29
Venue: Japanese Journal of Clinical Oncology
DOI: 10.1093/jjco/hyaa103
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
OBJECTIVE Improved prognostic prediction for patients with colorectal cancer stays an important challenge. This study aimed to develop an effective prognostic model for predicting survival in resected colorectal cancer patients through the implementation of the Super learner. METHODS A total of 2333 patients who met the inclusion criteria were enrolled in the cohort. We used multivariate Cox regression analysis to identify significant prognostic factors and Super learner to construct prognostic models. Prediction models were internally validated by 10-fold cross-validation and externally validated with a dataset from The Cancer Genome Atlas. Discrimination and calibration were evaluated by Harrell concordence index (C-index) and calibration plots, respectively. RESULTS Age, T stage, N stage, histological type, tumor location, lymph-vascular invasion, preoperative carcinoembryonic antigen and sample lymph nodes were integrated into prediction models. The concordance index of Super learner-based prediction model (SLM) was 0.792 (95% confidence interval: 0.767-0.818), which is higher than that of the seventh edition American Joint Committee on Cancer TNM staging system 0.689 (95% confidence interval: 0.672-0.703) for predicting overall survival (P < 0.05). In the external validation, the concordance index of the SLM for predicting overall survival was also higher than that of tumor-node-metastasis (TNM) stage system (0.764 vs. 0.682, respectively; P < 0.001). In addition, the SLM showed good calibration properties. CONCLUSIONS We developed and externally validated an effective prognosis prediction model based on Super learner, which offered more reliable and accurate prognosis prediction and may be used to more accurately identify high-risk patients who need more active surveillance in patients with resected colorectal cancer.

2020-06-27 — The Scalable Highly Adaptive Lasso [R package hal9001 version 0.2.6]

Authors: J. Coyle, N. Hejazi, M. Laan
Year: 2020
Publication Date: 2020-06-27
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso

Abstract:
Abstract unavailable from Semantic Scholar.

2020-06-24 — Using electronic health records to identify candidates for human immunodeficiency virus pre‐exposure prophylaxis: An application of super learning to risk prediction when the outcome is rare

Authors: Susan Gruber, D. Krakower, John T. Menchaca, K. Hsu, Rebecca Hawrusik, Judith C Maro, N. Cocoros, B. Kruskal, I. Wilson, K. Mayer, M. Klompas
Year: 2020
Publication Date: 2020-06-24
Venue: Statistics in Medicine
DOI: 10.1002/sim.8591
Link: Semantic Scholar
Matched Keywords: super learner, super learning

Abstract:
Human immunodeficiency virus (HIV) pre‐exposure prophylaxis (PrEP) protects high risk patients from becoming infected with HIV. Clinicians need help to identify candidates for PrEP based on information routinely collected in electronic health records (EHRs). The greatest statistical challenge in developing a risk prediction model is that acquisition is extremely rare. Methods: Data consisted of 180 covariates (demographic, diagnoses, treatments, prescriptions) extracted from records on 399 385 patient (150 cases) seen at Atrius Health (2007‐2015), a clinical network in Massachusetts. Super learner is an ensemble machine learning algorithm that uses k‐fold cross validation to evaluate and combine predictions from a collection of algorithms. We trained 42 variants of sophisticated algorithms, using different sampling schemes that more evenly balanced the ratio of cases to controls. We compared super learner's cross validated area under the receiver operating curve (cv‐AUC) with that of each individual algorithm. Results: The least absolute shrinkage and selection operator (lasso) using a 1:20 class ratio outperformed the super learner (cv‐AUC = 0.86 vs 0.84). A traditional logistic regression model restricted to 23 clinician‐selected main terms was slightly inferior (cv‐AUC = 0.81). Conclusion: Machine learning was successful at developing a model to predict 1‐year risk of acquiring HIV based on a physician‐curated set of predictors extracted from EHRs.

2020-06-24 — Super LeArner Prediction of NAb Panels (SLAPNAP): A Containerized Tool for Predicting Combination Monoclonal Broadly Neutralizing Antibody Sensitivity

Authors: David C. Benkeser, B. Williamson, Craig A. Magaret, Sohail Nizam, Peter B. Gilbert
Year: 2020
Publication Date: 2020-06-24
Venue: bioRxiv
DOI: 10.1101/2020.06.23.167718
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Summary Single broadly neutralizing antibody (bnAb) regimens are currently being evaluated in randomized trials for prevention efficacy against HIV-1 infection. Subsequent trials will evaluate combination bnAb regimens (e.g., cocktails, multi-specific antibodies), which demonstrate higher potency and breadth in vitro compared to single bnAbs. Given the large number of potential regimens in the research pipeline, methods for down-selecting these regimens into efficacy trials are of great interest. To aid the down-selection process, we developed Super LeArner Prediction of NAb Panels (SLAPNAP), a software tool for training and evaluating machine learning models that predict in vitro neutralization resistance of HIV Envelope pseudoviruses to a given single or combination bnAb regimen, based on Envelope amino acid sequence features. SLAPNAP also provides measures of variable importance of sequence features. These results can rank bnAb regimens by their potential prevention efficacy and aid assessments of how prevention efficacy depends on sequence features. Availability and Implementation SLAPNAP is a freely available docker image that can be downloaded from DockerHub (https://hub.docker.com/r/slapnap/slapnap). Source code and documentation are available at GitHub (respectively, https://github.com/benkeser/slapnap and https://benkeser.github.io/slapnap/). Contact David Benkeser, benkeser@emory.edu

2020-06-16 — Model Agnostic Combination for Ensemble Learning

Authors: Ohad Silbert, Yitzhak Peleg, E. Kopelowitz
Year: 2020
Publication Date: 2020-06-16
Venue: arXiv.org
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Ensemble of models is well known to improve single model performance. We present a novel ensembling technique coined MAC that is designed to find the optimal function for combining models while remaining invariant to the number of sub-models involved in the combination. Being agnostic to the number of sub-models enables addition and replacement of sub-models to the combination even after deployment, unlike many of the current methods for ensembling such as stacking, boosting, mixture of experts and super learners that lock the models used for combination during training and therefore need retraining whenever a new model is introduced into the ensemble. We show that on the Kaggle RSNA Intracranial Hemorrhage Detection challenge, MAC outperforms classical average methods, demonstrates competitive results to boosting via XGBoost for a fixed number of sub-models, and outperforms it when adding sub-models to the combination without retraining.

2020-06-15 — tmleCommunity: A R Package Implementing Target Maximum Likelihood Estimation for Community-level Data

Authors: Chi Zhang, J. Ahern, M. Laan, Oleg Sofrygin
Year: 2020
Publication Date: 2020-06-15
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation, tmle

Abstract:
Over the past years, many applications aim to assess the causal effect of treatments assigned at the community level, while data are still collected at the individual level among individuals of the community. In many cases, one wants to evaluate the effect of a stochastic intervention on the community, where all communities in the target population receive probabilistically assigned treatments based on a known specified mechanism (e.g., implementing a community-level intervention policy that target stochastic changes in the behavior of a target population of communities). The tmleCommunity package is recently developed to implement targeted minimum loss-based estimation (TMLE) of the effect of community-level intervention(s) at a single time point on an individual-based outcome of interest, including the average causal effect. Implementations of the inverse-probability-of-treatment-weighting (IPTW) and the G-computation formula (GCOMP) are also available. The package supports multivariate arbitrary (i.e., static, dynamic or stochastic) interventions with a binary or continuous outcome. Besides, it allows user-specified data-adaptive machine learning algorithms through SuperLearner, sl3 and h2oEnsemble packages. The usage of the tmleCommunity package, along with a few examples, will be described in this paper.

2020-06-15 — Targeted Maximum Likelihood Estimation of Community-based Causal Effect of Community-Level Stochastic Interventions

Authors: Chi Zhang, J. Ahern, M. Laan
Year: 2020
Publication Date: 2020-06-15
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Unlike the commonly used parametric regression models such as mixed models, that can easily violate the required statistical assumptions and result in invalid statistical inference, target maximum likelihood estimation allows more realistic data-generative models and provides double-robust, semi-parametric and efficient estimators. Target maximum likelihood estimators (TMLEs) for the causal effect of a community-level static exposure were previously proposed by Balzer et al. In this manuscript, we build on this work and present identifiability results and develop two semi-parametric efficient TMLEs for the estimation of the causal effect of the single time-point community-level stochastic intervention whose assignment mechanism can depend on measured and unmeasured environmental factors and its individual-level covariates. The first community-level TMLE is developed under a general hierarchical non-parametric structural equation model, which can incorporate pooled individual-level regressions for estimating the outcome mechanism. The second individual-level TMLE is developed under a restricted hierarchical model in which the additional assumption of no covariate interference within communities holds. The proposed TMLEs have several crucial advantages. First, both TMLEs can make use of individual level data in the hierarchical setting, and potentially reduce finite sample bias and improve estimator efficiency. Second, the stochastic intervention framework provides a natural way for defining and estimating casual effects where the exposure variables are continuous or discrete with multiple levels, or even cannot be directly intervened on. Also, the positivity assumption needed for our proposed causal parameters can be weaker than the version of positivity required for other casual parameters.

2020-06-07 — Targeted Maximum Likelihood Estimation [R package tmle version 1.5.0-1]

Authors: Susan Gruber, M. J. Laan
Year: 2020
Publication Date: 2020-06-07
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Abstract unavailable from Semantic Scholar.

2020-06-01 — SAT0587 MACHINE-LEARNING DERIVED ALGORITHMS FOR OUTCOMES PREDICTION IN RHEUMATIC DISEASES: APPLICATION TO RADIOGRAPHIC PROGRESSION IN EARLY AXIAL SPONDYLOARTHRITIS

Authors: R. Garofoli, M. Resche-Rigon, M. Dougados, D. Heijde, C. Roux, A. Moltó
Year: 2020
Publication Date: 2020-06-01
DOI: 10.1136/ANNRHEUMDIS-2020-EULAR.431
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Background: Axial spondyloarthritis (axSpA) is a chronic rheumatic disease that encompasses various clinical presentations: inflammatory chronic back pain, peripheral manifestations and extra-articular manifestations. The current nomenclature divides axSpA in radiographic (in the presence of radiographic sacroiliitis) and non-radiographic (in the absence of radiographic sacroiliitis, with or without MRI sacroiliitis. Given that the functional burden of the disease appears to be greater in patients with radiographic forms, it seems crucial to be able to predict which patients will be more likely to develop structural damage over time. Predictive factors for radiographic progression in axSpA have been identified through use of traditional statistical models like logistic regression. However, these models present some limitations. In order to overcome these limitations and to improve the predictive performance, machine learning (ML) methods have been developed. Objectives: To compare ML models to traditional models to predict radiographic progression in patients with early axSpA. Methods: Study design: prospective French multicentric cohort study (DESIR cohort) with 5years of follow-up. Patients: all patients included in the cohort, i.e. 708 patients with inflammatory back pain for >3 months but Results: 10-fold cv-AUC for traditional models were 0.79 and 0.78 for M2 and M3, respectively. The 3 best models in the ML algorithm were the GAM, the DBARTS and the Super Learner models, with 10-fold cv-AUC of: 0.77, 0.76 and 0.74, respectively (Table 1). Conclusion: Traditional models predicted better radiographic progression than ML models in this early axSpA population. Further ML algorithms image-based or with other artificial intelligence methods (e.g. deep learning) might perform better than traditional models in this setting. Acknowledgments: Thanks to the French National Society of Rheumatology and the DESIR cohort. Disclosure of Interests: Romain Garofoli: None declared, Matthieu resche-rigon: None declared, Maxime Dougados Grant/research support from: AbbVie, Eli Lilly, Merck, Novartis, Pfizer and UCB Pharma, Consultant of: AbbVie, Eli Lilly, Merck, Novartis, Pfizer and UCB Pharma, Speakers bureau: AbbVie, Eli Lilly, Merck, Novartis, Pfizer and UCB Pharma, Desiree van der Heijde Consultant of: AbbVie, Amgen, Astellas, AstraZeneca, BMS, Boehringer Ingelheim, Celgene, Cyxone, Daiichi, Eisai, Eli-Lilly, Galapagos, Gilead Sciences, Inc., Glaxo-Smith-Kline, Janssen, Merck, Novartis, Pfizer, Regeneron, Roche, Sanofi, Takeda, UCB Pharma; Director of Imaging Rheumatology BV, Christian Roux: None declared, Anna Molto Grant/research support from: Pfizer, UCB, Consultant of: Abbvie, BMS, MSD, Novartis, Pfizer, UCB

2020-05-22 — Nonparametric inverse‐probability‐weighted estimators based on the highly adaptive lasso

Authors: Ashkan Ertefaie, N. Hejazi, M. J. van der Laan
Year: 2020
Publication Date: 2020-05-22
Venue: Biometrics
DOI: 10.1111/biom.13719
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso

Abstract:
Inverse‐probability‐weighted estimators are the oldest and potentially most commonly used class of procedures for the estimation of causal effects. By adjusting for selection biases via a weighting mechanism, these procedures estimate an effect of interest by constructing a pseudopopulation in which selection biases are eliminated. Despite their ease of use, these estimators require the correct specification of a model for the weighting mechanism, are known to be inefficient, and suffer from the curse of dimensionality. We propose a class of nonparametric inverse‐probability‐weighted estimators in which the weighting mechanism is estimated via undersmoothing of the highly adaptive lasso, a nonparametric regression function proven to converge at nearly n−1/3$ n^{-1/3}$ ‐rate to the true weighting mechanism. We demonstrate that our estimators are asymptotically linear with variance converging to the nonparametric efficiency bound. Unlike doubly robust estimators, our procedures require neither derivation of the efficient influence function nor specification of the conditional outcome model. Our theoretical developments have broad implications for the construction of efficient inverse‐probability‐weighted estimators in large statistical models and a variety of problem settings. We assess the practical performance of our estimators in simulation studies and demonstrate use of our proposed methodology with data from a large‐scale epidemiologic study.

2020-05-16 — The peril of power: a tutorial on using simulation to better understand when and how we can estimate mediating effects.

Authors: K. Rudolph, Dana E. Goin, E. Stuart
Year: 2020
Publication Date: 2020-05-16
Venue: American Journal of Epidemiology
DOI: 10.1093/aje/kwaa083
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Mediation analyses are valuable for examining mechanisms underlying an association, investigating possible explanations for nonintuitive results, or identifying interventions that can improve health in the context of nonmanipulable exposures. However, designing a study for the purpose of answering a mediation-related research question remains challenging because sample size and power calculations for mediation analyses are typically not conducted or are crude approximations. Consequently, many studies are probably conducted without first establishing that they have the statistical power required to detect a meaningful effect, potentially resulting in wasted resources. In an effort to advance more accurate power calculations for estimating direct and indirect effects, we present a tutorial demonstrating how to conduct a flexible, simulation-based power analysis. In this tutorial, we compare power to estimate direct and indirect effects across various estimators (the Baron and Kenny estimator (J Pers Soc Psychol. 1986;51(6):1173–1182), inverse odds ratio weighting, and targeted maximum likelihood estimation) using various data structures designed to mimic important features of real data. We include step-by-step commented R code (R Foundation for Statistical Computing, Vienna, Austria) in an effort to lower implementation barriers to ultimately improving power assessment in mediation studies.

2020-05-08 — The Impact of Same-Day Antiretroviral Therapy Initiation Under the World Health Organization Treat-All Policy

Authors: B. Kerschberger, A. Boulle, Rudo Kuwengwa, I. Ciglenecki, M. Schomaker
Year: 2020
Publication Date: 2020-05-08
Venue: American Journal of Epidemiology
DOI: 10.1093/aje/kwab032
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Abstract Rapid initiation of antiretroviral therapy (ART) is recommended for people living with human immunodeficiency virus (HIV), with the option to start treatment on the day of diagnosis (same-day ART). However, the effect of same-day ART remains unknown in realistic public sector settings. We established a cohort of ≥16-year-old patients who initiated first-line ART under a treat-all policy in Nhlangano (Eswatini) during 2014–2016, either on the day of HIV care enrollment (same-day ART) or 1–14 days thereafter (early ART). Directed acyclic graphs, flexible parametric survival analysis, and targeted maximum likelihood estimation (TMLE) were used to estimate the effect of same-day-ART initiation on a composite unfavorable treatment outcome (loss to follow-up, death, viral failure, treatment switch). Of 1,328 patients, 839 (63.2%) initiated same-day ART. The adjusted hazard ratio of the unfavorable outcome was higher, 1.48 (95% confidence interval: 1.16, 1.89), for same-day ART compared with early ART. TMLE suggested that after 1 year, 28.9% of patients would experience the unfavorable outcome under same-day ART compared with 21.2% under early ART (difference: 7.7%; 1.3%–14.1%). This estimate was driven by loss to follow-up and varied over time, with a higher hazard during the first year after HIV care enrollment and a similar hazard thereafter. We found an increased risk with same-day ART. A limitation was that possible silent transfers that were not captured.

2020-05-06 — Using Administrative Data to Predict Suicide After Psychiatric Hospitalization in the Veterans Health Administration System

Authors: R. Kessler, M. Bauer, T. Bishop, Olga V Demler, S. Dobscha, Sarah M Gildea, J. Goulet, E. Karras, J. Kreyenbuhl, S. Landes, Howard Liu, Alexander Luedtke, P. Mair, William H. B. McAuliffe, M. Nock, M. Petukhova, W. Pigeon, N. Sampson, J. Smoller, L. Weinstock, R. Bossarte
Year: 2020
Publication Date: 2020-05-06
Venue: Frontiers in Psychiatry
DOI: 10.3389/fpsyt.2020.00390
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
There is a very high suicide rate in the year after psychiatric hospital discharge. Intensive postdischarge case management programs can address this problem but are not cost-effective for all patients. This issue can be addressed by developing a risk model to predict which inpatients might need such a program. We developed such a model for the 391,018 short-term psychiatric hospital admissions of US veterans in Veterans Health Administration (VHA) hospitals 2010–2013. Records were linked with the National Death Index to determine suicide within 12 months of hospital discharge (n=771). The Super Learner ensemble machine learning method was used to predict these suicides for time horizon between 1 week and 12 months after discharge in a 70% training sample. Accuracy was validated in the remaining 30% holdout sample. Predictors included VHA administrative variables and small area geocode data linked to patient home addresses. The models had AUC=.79–.82 for time horizons between 1 week and 6 months and AUC=.74 for 12 months. An analysis of operating characteristics showed that 22.4%–32.2% of patients who died by suicide would have been reached if intensive case management was provided to the 5% of patients with highest predicted suicide risk. Positive predictive value (PPV) at this higher threshold ranged from 1.2% over 12 months to 3.8% per case manager year over 1 week. Focusing on the low end of the risk spectrum, the 40% of patients classified as having lowest risk account for 0%–9.7% of suicides across time horizons. Variable importance analysis shows that 51.1% of model performance is due to psychopathological risk factors accounted, 26.2% to social determinants of health, 14.8% to prior history of suicidal behaviors, and 6.6% to physical disorders. The paper closes with a discussion of next steps in refining the model and prospects for developing a parallel precision treatment model.

2020-05-04 — Simple sensitivity analysis for control selection bias.

Authors: Louisa H. Smith, T. VanderWeele
Year: 2020
Publication Date: 2020-05-04
Venue: Epidemiology
DOI: 10.1097/EDE.0000000000001207
Link: Semantic Scholar
Matched Keywords: super learner, super learning

Abstract:
e44 | www.epidem.com © 2020 Wolters Kluwer Health, Inc. All rights reserved. Simple Sensitivity Analysis for Control Selection Bias ACM SIGKDD Explorations Newsletter. 2004;6:1–6. 3. Naimi AI, Balzer LB. Stacked generalization: an introduction to super learning. Eur J Epidemiol. 2018;33:459–464. 4. SuperLearner: Super Learner Prediction. [computer program]. Version R package version 2.02.4. Available at: https://CRAN.R-project.org/ package=SuperLearner2018. Accessed 1 October 2019. 5. Batista G, Prati RC, Monard MC. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter. 2004;6:20–29. 6. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artifi Intell Res. 2002;16:321–357. 7. Kuhn M. Building predictive models in R using the caret package. J Stat Soft. 2008;28:1–26. 8. pROC: an open-source package for R and S+ to analyze and compare ROC curves [computer program]. BMC Bioinformatics. 2011;12:77.

2020-05-01 — Prediction of an Acute Hypotensive Episode During an ICU Hospitalization With a Super Learner Machine-Learning Algorithm

Authors: Ményssa Cherifa, A. Blet, A. Chambaz, E. Gayat, M. Resche-Rigon, R. Pirracchio
Year: 2020
Publication Date: 2020-05-01
Venue: Anesthesia and Analgesia
DOI: 10.1213/ANE.0000000000004539
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
BACKGROUND: Acute hypotensive episodes (AHE), defined as a drop in the mean arterial pressure (MAP) <65 mm Hg lasting at least 5 consecutive minutes, are among the most critical events in the intensive care unit (ICU). They are known to be associated with adverse outcome in critically ill patients. AHE prediction is of prime interest because it could allow for treatment adjustment to predict or shorten AHE. METHODS: The Super Learner (SL) algorithm is an ensemble machine-learning algorithm that we specifically trained to predict an AHE 10 minutes in advance. Potential predictors included age, sex, type of care unit, severity scores, and time-evolving characteristics such as mechanical ventilation, vasopressors, or sedation medication as well as features extracted from physiological signals: heart rate, pulse oximetry, and arterial blood pressure. The algorithm was trained on the Medical Information Mart for Intensive Care dataset (MIMIC II) database. Internal validation was based on the area under the receiver operating characteristic curve (AUROC) and the Brier score (BS). External validation was performed using an external dataset from Lariboisière hospital, Paris, France. RESULTS: Among 1151 patients included, 826 (72%) patients had at least 1 AHE during their ICU stay. Using 1 single random period per patient, the SL algorithm with Haar wavelets transform preprocessing was associated with an AUROC of 0.929 (95% confidence interval [CI], 0.899–0.958) and a BS of 0.08. Using all available periods for each patient, SL with Haar wavelets transform preprocessing was associated with an AUROC of 0.890 (95% CI, 0.886–0.895) and a BS of 0.11. In the external validation cohort, the AUROC reached 0.884 (95% CI, 0.775–0.993) with 1 random period per patient and 0.889 (0.768–1) with all available periods and BSs <0.1. CONCLUSIONS: The SL algorithm exhibits good performance for the prediction of an AHE 10 minutes ahead of time. It allows an efficient, robust, and rapid evaluation of the risk of hypotension that opens the way to routine use.

2020-05-01 — Outliers-Robust CFAR Detector of Gaussian Clutter Based on the Truncated-Maximum-Likelihood- Estimator in SAR Imagery

Authors: Jiaqiu Ai, Qiwu Luo, Xuezhi Yang, Zhiping Yin, Hao Xu
Year: 2020
Publication Date: 2020-05-01
Venue: IEEE transactions on intelligent transportation systems (Print)
DOI: 10.1109/TITS.2019.2911692
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
This paper proposes an outliers-robust constant false-alarm rate (OR-CFAR) detector of Gaussian clutter based on the truncated-maximum-likelihood estimator (TMLE) in SAR imagery. The proposed method aims at elevating the detection performance in multiple-target environment, where the sea clutter samples are often contaminated by the interfering target pixels, the azimuth ambiguities, and the breakwater. As a consequence, the parameters used for statistical modeling are over-estimated, resulting in a degradation of the CFAR detection rate. Inspired by the traditional two-parameter CFAR (TP-CFAR) detector of Gaussian clutter, OR-CFAR designs an adaptive threshold-based clutter truncation method to eliminate the high-intensity outliers from the clutter samples in the local reference window, and the probability density function (PDF) of the sea clutter can be accurately modeled through the newly raised TMLE. Furthermore, the optimal truncation depth used for clutter truncation and PDF modeling is evaluated and selected properly to get the best detection results. The OR-CFAR greatly enhances the CFAR detection rate in multiple-target environment, and it is computationally simple and efficient, which has a great application value. The Chinese Gaofen-3 SAR data are used for experiments to show the better detection performance of OR-CFAR.

2020-05-01 — Effect of Sepsis on Death as Modified by Solid Organ Transplantation

Authors: Kevin S Ackerman, K. Hoffman, I. Díaz, W. Simmons, K. Ballman, R. P. Kodiyanplakkal, E. Schenck
Year: 2020
Publication Date: 2020-05-01
Venue: Open Forum Infectious Diseases
DOI: 10.1093/ofid/ofad148
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Abstract Background Patients who have undergone solid organ transplants (SOT) have an increased risk for sepsis compared with the general population. Paradoxically, studies suggest that SOT patients with sepsis may experience better outcomes compared with those without a SOT. However, these analyses used previous definitions of sepsis. It remains unknown whether the more recent definitions of sepsis and modern analytic approaches demonstrate a similar relationship. Methods Using the Weill Cornell-Critical Care Database for Advanced Research, we analyzed granular physiologic, microbiologic, comorbidity, and therapeutic data in patients with and without SOT admitted to intensive care units (ICUs). We used a survival analysis with a targeted minimum loss-based estimation, adjusting for within-group (SOT and non-SOT) potential confounders to ascertain whether the effect of sepsis, defined by sepsis-3, on 28-day mortality was modified by SOT status. We performed additional analyses on restricted populations. Results We analyzed 28 431 patients: 439 with SOT and sepsis, 281 with SOT without sepsis, 6793 with sepsis and without SOT, and 20 918 with neither. The most common SOT types were kidney (475) and liver (163). Despite a higher severity of illness in both sepsis groups, the adjusted sepsis-attributable effect on 28-day mortality for non-SOT patients was 4.1% (95% confidence interval [CI], 3.8–4.5) and −14.4% (95% CI, −16.8 to −12) for SOT patients. The adjusted SOT effect modification was −18.5% (95% CI, −21.2 to −15.9). The adjusted sepsis-attributable effect for immunocompromised controls was −3.5% (95% CI, −4.5 to −2.6). Conclusions Across a large database of patients admitted to ICUs, the sepsis-associated 28-day mortality effect was significantly lower in SOT patients compared with controls.

2020-04-21 — Seroprevalence of antibodies against Chlamydia trachomatis and enteropathogens and distance to the nearest water source among young children in the Amhara Region of Ethiopia

Authors: Kristen Aiemjoy, Solomon Aragie, Dionna M. Wittberg, Z. Tadesse, E. K. Callahan, S. Gwyn, Diana L. Martin, J. Keenan, B. Arnold
Year: 2020
Publication Date: 2020-04-21
Venue: medRxiv
DOI: 10.1371/journal.pntd.0008647
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Background: The transmission of trachoma, caused by repeat infections with Chlamydia trachomatis, and many enteropathogens are linked to water quantity. We hypothesized that children living further from a water source would have higher exposure to C. trachomatis and enteric pathogens as determined by antibody responses. Methods: We used a multiplex bead assay to measure IgG antibody responses to C. trachomatis, Giardia intestinalis, Cryptosporidium parvum, Entamoeba histolytica, Salmonella enterica, Campylobacter jejuni, enterotoxigenic Escherichia coli (ETEC) and Vibrio cholerae in eluted dried blood spots collected from 2267 children ages 1-9 years in 40 communities in rural Ethiopia in 2016. Linear distance from the child's house to the nearest water source was calculated. We derived seroprevalence cutoffs using external negative control populations, if available, or by fitting finite mixture models. We used targeted maximum likelihood estimation to estimate differences in seroprevalence according to distance to the nearest water source. Results: Seroprevalence among 1-9-year-olds was 43% for C. trachomatis, 28% for S. enterica, 70% for E. histolytica, 54% for G. intestinalis, 96% for C. jejuni, 76% for ETEC and 94% for C. parvum. Seroprevalence increased with age for all pathogens. Median distance to the nearest water source was 473 meters (IQR 268, 719). Children living furthest from a water source had a 12% (95% CI: 2.6, 21.6) higher seroprevalence of S. enterica and a 12.7% (95% CI: 2.9, 22.6) higher seroprevalence of G. intestinalis compared to children living nearest. Conclusion: Seroprevalence for C. trachomatis and enteropathogens was high, with marked increases for most enteropathogens in the first two years of life. Children living further from a water source had higher seroprevalence of S. enterica and G. intestinalis indicating that improving access to water in the Ethiopia's Amhara region may reduce exposure to these enteropathogens in young children.

2020-04-21 — Machine Learning for Causal Inference: On the Use of Cross-fit Estimators

Authors: P. Zivich, A. Breskin
Year: 2020
Publication Date: 2020-04-21
Venue: Epidemiology
DOI: 10.1097/EDE.0000000000001332
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Supplemental Digital Content is available in the text. Background: Modern causal inference methods allow machine learning to be used to weaken parametric modeling assumptions. However, the use of machine learning may result in complications for inference. Doubly robust cross-fit estimators have been proposed to yield better statistical properties. Methods: We conducted a simulation study to assess the performance of several different estimators for the average causal effect. The data generating mechanisms for the simulated treatment and outcome included log-transforms, polynomial terms, and discontinuities. We compared singly robust estimators (g-computation, inverse probability weighting) and doubly robust estimators (augmented inverse probability weighting, targeted maximum likelihood estimation). We estimated nuisance functions with parametric models and ensemble machine learning separately. We further assessed doubly robust cross-fit estimators. Results: With correctly specified parametric models, all of the estimators were unbiased and confidence intervals achieved nominal coverage. When used with machine learning, the doubly robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage. Conclusions: Due to the difficulty of properly specifying parametric models in high-dimensional data, doubly robust estimators with ensemble learning and cross-fitting may be the preferred approach for estimation of the average causal effect in most epidemiologic studies. However, these approaches may require larger sample sizes to avoid finite-sample issues.

2020-04-03 — Stacked generalizations in imbalanced fraud data sets using resampling methods

Authors: Kathleen R Kerwin, Nathaniel D. Bastian
Year: 2020
Publication Date: 2020-04-03
Venue: arXiv.org
DOI: 10.1177/1548512920962219
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Predicting fraud is challenging due to inherent issues in the fraud data structure, since the crimes are committed through trickery or deceit with an ever-present moving target of changing modus operandi to circumvent human and system controls. As a national security challenge, criminals continually exploit the electronic financial system to defraud consumers and businesses by finding weaknesses in the system, including in audit controls. This study uses stacked generalization using meta or super learners for improving the performance of algorithms in step one (minimizing the algorithm error rate to reduce its bias in the learning set) and then in step two the results are input into the meta learner with its stacked blended output (with the weakest algorithms learning better). A fundamental key to fraud data is that it is inherently not systematic, and an optimal resampling methodology has yet not been identified. Building a test harness, for all permutations of algorithm sample set pairs, demonstrates that the complex, intrinsic data structures are all thoroughly tested. A comparative analysis on fraud data that applies stacked generalizations provides useful insight to find the optimal mathematical formula for imbalanced fraud data sets necessary to improve upon fraud detection for national security.

2020-04-01 — Integration of human cell lines gene expression and chemical properties of drugs for Drug Induced Liver Injury prediction

Authors: W. Lesiński, Krzysztof Mnich, A. Golinska, W. Rudnicki
Year: 2020
Publication Date: 2020-04-01
Venue: Biology Direct
DOI: 10.1186/s13062-020-00286-z
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Drug-induced liver injury (DILI) is one of the primary problems in drug development. Early prediction of DILI can bring a significant reduction in the cost of clinical trials. In this work we examined whether occurrence of DILI can be predicted using gene expression profile in cancer cell lines and chemical properties of drugs. We used gene expression profiles from 13 human cell lines, as well as molecular properties of drugs to build Machine Learning models of DILI. To this end, we have used a robust cross-validated protocol based on feature selection and Random Forest algorithm. In this protocol we first identify the most informative variables and then use them to build predictive models. The models are first built using data from single cell lines, and chemical properties. Then they are integrated using Super Learner method with several underlying methods for integration. The entire modelling process is performed using nested cross-validation. We have obtained weakly predictive ML models when using either molecular descriptors, or some individual cell lines (AUC ∈(0.55−0.61)). Models obtained with the Super Learner approach have a significantly improved accuracy (AUC=0.73), which allows to divide substances in two categories: low-risk and high-risk.

2020-03-27 — The Impact of Delayed Switch to Second-Line Antiretroviral Therapy on Mortality, Depending on Failure Time Definition and CD4 Count at Failure.

Authors: Helen Bell-Gorrod, M. Fox, A. Boulle, H. Prozesky, R. Wood, F. Tanser, M. Davies, M. Schomaker
Year: 2020
Publication Date: 2020-03-27
Venue: American Journal of Epidemiology
DOI: 10.1093/aje/kwaa049
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Little is known about the functional relationship of delaying second-line treatment initiation for HIV-positive patients and mortality, given a patient's immune status. We included 7255 patients starting antiretroviral therapy between 2004-2017, from 9 South African cohorts, with virological failure and complete baseline data. We estimated the impact of switch time on the hazard of death using inverse probability of treatment weighting (IPTW) of marginal structural models. The non-linear relationship between month of switch and the 5-year survival probability, stratified by CD4 count at failure, was estimated with targeted maximum likelihood estimation (TMLE). We adjusted for measured time-varying confounding by CD4 count, viral load and visit frequency. 5-year mortality was estimated as 10.5% (2.2%; 18.8%) for immediate switch and as 26.6% (20.9%; 32.3%) for no switch (49.9% if CD4 count<100 cells/mm3). The hazard of death was estimated to be 0.40 (95%CI: 0.33-0.48) times lower if everyone had been switched immediately compared to never. The shorter the delay in switching, the lower the hazard of death, e.g. delaying 30-60 days reduced the hazard 0.52 (0.41-0.65) times, and 60-120 days 0.56 (0.47-0.66) times. Early treatment switch is particularly important for patients with low CD4 counts at failure.

2020-03-26 — Low-Cost, Transcriptional Diagnostic to Accurately Categorize Lymphomas in Low- and Middle-Income Countries

Authors: F. Valvert, Oscar Silva, E. Solorzano, M. Puligandla, Marcos Mauricio Siliézar Tala, Timothy Guyon, Samuel L. Dixon, Nelly López, Francisco López, Robert Terbrueggen, K. Stevenson, Y. Natkunam, David M. Weinstock, Edward L Briercheck
Year: 2020
Publication Date: 2020-03-26
Venue: Social Science Research Network
DOI: 10.2139/ssrn.3564407
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Background: The lack of access to adequate pathology services is a critical roadblock for both improvements in health and sustainable development across lower- and middle-income countries (LMICs). We hypothesized that a low-cost, parsimonious gene expression assay using paraffin-embedded biopsies from LMICs could distinguish lymphoma subtypes and guide treatment. Methods: We reviewed all biopsies obtained between 2006-2018 for suspicion o lymphoma at INCAN hospital in Guatemala City. Gold-standard diagnoses were established by immunohistochemistry and FISH then binned into 9 categories: nonmalignant, aggressive B-cell, diffuse large B-cell (DLBCL), follicular, Hodgkin, mantle cell, marginal zone, NK/T-cell, or mature T-cell lymphoma. We established a chemical ligation probe-based assay (CLPA) that quantifies expression of 37 genes by capillary electrophoresis for <$10 USD/sample. To assign bins based on gene expression, 13 models were evaluated as candidate base learners and class probabilities from each model were then used as predictors in an extreme gradient boosting super learner. An additional two-class model was developed to classify DLBCL cell-of-origin (COO). Cases with call probabilities <0.6 were classified as indeterminate. Findings: Assay failure occurred in 60 (8·9%)/670 biopsies and was enriched among Hodgkin lymphomas (24·8%). 560 diagnostic samples were divided into 70% (n=397) training and 30% (n=163) validation cohorts. Overall accuracy for the validation cohort was 86% [95% CI; 80-91%]. After excluding 28 (17%) indeterminate calls, accuracy increased to 94% [95% CI; 89-97%]. Accuracy for a cohort of relapsed/refractory biopsies (n=39) was 79% and 88% after excluding indeterminate cases. Accuracy for DLBCL COO classification compared to the Hans IHC algorithm (n=51) was 80% [95% CI; 67-90%]. Interpretation: Machine-learning analysis of gene expression accurately classifies paraffin-embedded lymphoma biopsies from LMICs. Low-cost, open source assays could transform diagnosis, subtyping, and assessment of therapeutic targets for patients with cancer worldwide. Funding Statement: American Society of Hematology, US State Department, ASCO, LLS, Celgene and NIH Declaration of Interests: T.G., S.L.D. and R.T. are employees of DxTerity Diagnostics. D.M.W. is a co-founder of Travera, Ajax and Root Diagnostics. He receives consulting or advisory board fees from Magnetar, Bantam, ASELL, Ossium, Myeloid Therapeutics, Daiichi Sankyo, and Elstar. He receives research funding from Daiichi Sankyo and Verastem. The remaining authors declare no conflicts-of-interest. Ethics Approval Statement: This study was approved by the Institutional Review Boards of Dana-Farber Cancer Institute and Stanford University and the Ethics Committee of La Liga Nacional Contra el Cancer Research.

2020-03-18 — Comparing the performance of statistical methods that generalize effect estimates from randomized controlled trials to much larger target populations

Authors: I. Schmid, K. Rudolph, T. Nguyen, H. Hong, Marissa J. Seamans, B. Ackerman, E. Stuart
Year: 2020
Publication Date: 2020-03-18
Venue: Communications in statistics. Simulation and computation
DOI: 10.1080/03610918.2020.1741621
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract Policymakers use results from randomized controlled trials to inform decisions about whether to implement treatments in target populations. Various methods—including inverse probability weighting, outcome modeling, and Targeted Maximum Likelihood Estimation—that use baseline data available in both the trial and target population have been proposed to generalize the trial treatment effect estimate to the target population. Often the target population is significantly larger than the trial sample, which can cause estimation challenges. We conduct simulations to compare the performance of these methods in this setting. We vary the size of the target population, the proportion of the target population selected into the trial, and the complexity of the true selection and outcome models. All methods performed poorly when the trial size was only 2% of the target population size or the target population included only 1,000 units. When the target population or the proportion of units selected into the trial was larger, some methods, such as outcome modeling using Bayesian Additive Regression Trees, performed well. We caution against generalizing using these existing approaches when the target population is much larger than the trial sample and advocate future research strives to improve methods for generalizing to large target populations.

2020-03-18 — Bootstrap Bias Corrected Cross Validation Applied to Super Learning

Authors: Krzysztof Mnich, A. Golinska, A. Polewko-Klim, W. Rudnicki
Year: 2020
Publication Date: 2020-03-18
Venue: International Conference on Conceptual Structures
DOI: 10.1007/978-3-030-50420-5_41
Link: Semantic Scholar
Matched Keywords: super learner, super learning

Abstract:
Super learner algorithm can be applied to combine results of multiple base learners to improve quality of predictions. The default method for verification of super learner results is by nested cross validation; however, this technique is very expensive computationally. It has been proposed by Tsamardinos et al., that nested cross validation can be replaced by resampling for tuning hyper-parameters of the learning algorithms. The main contribution of this study is to apply this idea to verification of super learner. We compare the new method with other verification methods, including nested cross validation. Tests were performed on artificial data sets of diverse size and on seven real, biomedical data sets. The resampling method, called Bootstrap Bias Correction, proved to be a reasonably precise and very cost-efficient alternative for nested cross validation.

2020-03-13 — Longitudinal Targeted Maximum Likelihood Estimation [R package ltmle version 1.2-0]

Authors: Joshua Schwab, S. Lendle, M. Petersen, M. Laan
Year: 2020
Publication Date: 2020-03-13
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2020-03-11 — A comprehensive evaluation of ensemble learning for stock-market prediction

Authors: Isaac Kofi Nti, Adebayo Felix Adekoya, B. Weyori
Year: 2020
Publication Date: 2020-03-11
Venue: Journal of Big Data
DOI: 10.1186/s40537-020-00299-5
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Stock-market prediction using machine-learning technique aims at developing effective and efficient models that can provide a better and higher rate of prediction accuracy. Numerous ensemble regressors and classifiers have been applied in stock market predictions, using different combination techniques. However, three precarious issues come in mind when constructing ensemble classifiers and regressors. The first concerns with the choice of base regressor or classifier technique adopted. The second concerns the combination techniques used to assemble multiple regressors or classifiers and the third concerns with the quantum of regressors or classifiers to be ensembled. Subsequently, the number of relevant studies scrutinising these previously mentioned concerns are limited. In this study, we performed an extensive comparative analysis of ensemble techniques such as boosting, bagging, blending and super learners (stacking). Using Decision Trees (DT), Support Vector Machine (SVM) and Neural Network (NN), we constructed twenty-five (25) different ensembled regressors and classifiers. We compared their execution times, accuracy, and error metrics over stock-data from Ghana Stock Exchange (GSE), Johannesburg Stock Exchange (JSE), Bombay Stock Exchange (BSE-SENSEX) and New York Stock Exchange (NYSE), from January 2012 to December 2018. The study outcome shows that stacking and blending ensemble techniques offer higher prediction accuracies (90–100%) and (85.7–100%) respectively, compared with that of bagging (53–97.78%) and boosting (52.7–96.32%). Furthermore, the root means square error (RMSE) recorded by stacking (0.0001–0.001) and blending (0.002–0.01) shows a better fit of ensemble classifiers and regressors based on these two techniques in market analyses compared with bagging (0.01–0.11) and boosting (0.01–0.443). Finally, the results undoubtedly suggest that an innovative study in the domain of stock market direction prediction ought to include ensemble techniques in their sets of algorithms.

2020-03-05 — haldensify: Highly Adaptive Lasso Conditional Density Estimation

Authors: N. Hejazi, David C. Benkeser, M. J. Laan
Year: 2020
Publication Date: 2020-03-05
DOI: 10.5281/ZENODO.3698330
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso

Abstract:
Abstract unavailable from Semantic Scholar.

2020-03-04 — Estimating the effect of central bank independence on inflation using longitudinal targeted maximum likelihood estimation

Authors: P. Baumann, M. Schomaker, Enzo Rossi
Year: 2020
Publication Date: 2020-03-04
Venue: Journal of Causal Inference
DOI: 10.1515/jci-2020-0016
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract The notion that an independent central bank reduces a country’s inflation is a controversial hypothesis. To date, it has not been possible to satisfactorily answer this question because the complex macroeconomic structure that gives rise to the data has not been adequately incorporated into statistical analyses. We develop a causal model that summarizes the economic process of inflation. Based on this causal model and recent data, we discuss and identify the assumptions under which the effect of central bank independence on inflation can be identified and estimated. Given these and alternative assumptions, we estimate this effect using modern doubly robust effect estimators, i.e., longitudinal targeted maximum likelihood estimators. The estimation procedure incorporates machine learning algorithms and is tailored to address the challenges associated with complex longitudinal macroeconomic data. We do not find strong support for the hypothesis that having an independent central bank for a long period of time necessarily lowers inflation. Simulation studies evaluate the sensitivity of the proposed methods in complex settings when certain assumptions are violated and highlight the importance of working with appropriate learning algorithms for estimation.

2020-03-01 — Super learner analysis of real‐time electronically monitored adherence to antiretroviral therapy under constrained optimization and comparison to non‐differentiated care approaches for persons living with HIV in rural Uganda

Authors: Alejandra Benitez, N. Musinguzi, D. Bangsberg, M. Bwana, C. Muzoora, P. Hunt, Jeffrey N. Martin, J. Haberer, M. Petersen
Year: 2020
Publication Date: 2020-03-01
Venue: Journal of the International AIDS Society
DOI: 10.1002/jia2.25467
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Real‐time electronic adherence monitoring (EAM) systems could inform on‐going risk assessment for HIV viraemia and be used to personalize viral load testing schedules. We evaluated the potential of real‐time EAM (transferred via cellular signal) and standard EAM (downloaded via USB cable) in rural Uganda to inform individually differentiated viral load testing strategies by applying machine learning approaches.

2020-03-01 — Issue Information

Authors: Unknown
Year: 2020
Publication Date: 2020-03-01
Venue: Biometrics
DOI: 10.1111/biom.13243
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso, targeted minimum loss based estimation

Abstract:
Large scale maximum average power multiple inference on time-course count data with application to RNA-seq analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Meng Cao,Wen Zhou, F. Jay Breidt, and Graham Peers 9 Structured gene-environment interaction analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mengyun Wu, Qingzhao Zhang, and Shuangge Ma 23 Testing independence between two random sets for the analysis of colocalization in bioimaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Frédéric Lavancier, Thierry Pécot, Liu Zengzhen, and Charles Kervrann 36 Building generalized linear models with ultrahigh dimensional features: A sequentially conditional approach . . . . . . . . . . . . . . . Qi Zheng, Hyokyoung G. Hong, and Yi Li 47 Integrative factorization of bidimensionally linked matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jun Young Park and Eric F. Lock 61 Estimation of covariance matrix of multivariate longitudinal data using modifi ed Choleksky and hypersphere decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Keunbaik Lee, Hyunsoon Cho, Min-Sun Kwak, and Eun Jin Jang 75 A Bayesian approach to joint modeling of matrix-valued imaging data and treatment outcome with applications to depression studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bei Jiang, Eva Petkova, Thaddeus Tarpey, and R. Todd Ogden 87 Global identifi ability of latent class models with applications to diagnostic test accuracy studies: A Gröbner basis approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rui Duan,Ming Cao, Yang Ning,Mingfu Zhu, Bin Zhang, AidanMcDermott, Haitao Chu, Xiaohua Zhou, Jason H. Moore, . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Joseph G. Ibrahim, Daniel O. Scharfstein, and Yong Chen 98 Robust inference on the average treatment effect using the outcome highly adaptive lasso . . . . . . . . . . . . . . . . . . . Cheng Ju, David Benkeser, and Mark J. van der Laan 109 Robust inference for the stepped wedge design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . James P. Hughes, Patrick J. Heagerty, Fan Xia, and Yuqi Ren 119 Semiparametric mixed-scale models using shared Bayesian forests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Antonio R. Linero, Debajyoti Sinha, and Stuart R. Lipsitz 131 Data-adaptive longitudinal model selection in causal inference with collaborative targeted minimum loss-based estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mireille E. Schnitzer, Joel Sango, Steve Ferreira Guerra, and Mark J. van der Laan 145 A geostatistical framework for combining spatially referenced disease prevalence data from multiple diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Benjamin Amoah, Peter J. Diggle, and Emanuele Giorgi 158 An online updating approach for testing the proportional hazards assumption with streams of survival data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yishu Xue, HaiYingWang, Jun Yan, and Elizabeth D. Schifano 171 Adaptive treatment allocation for comparative clinical studies with recurrent events data . . . . . . . . . . . . . . . Jingya Gao, Pei-Fang Su, Feifang Hu, and Siu Hung Cheung 183 A response-adaptive randomization procedure for multi-armed clinical trials with normally distributed outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .S. FayeWilliamson and Sofía S. Villar 197 Novel two-phase sampling designs for studying binary outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LeWang,Matthew L.Williams, Yong Chen, and Jinbo Chen 210 Design and analysis of bridging studies with prior probabilities on the null and alternative hypotheses . . . . . . . . . . . . . . . . . . . . Donglin Zeng, Zhiying Pan, and D. Y. Lin 224 Randomization inference with general interference and censoring . . . . . . . . . WenWei Loh,Michael G. Hudgens, John D. Clemens, Mohammad Ali, and Michael E. Emch 235 Afunctional generalized F-test for signal detectionwith applications to event-related potentials signifi cance analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .David Causeur, Ching-Fan Sheu, Emeline Perthame, and Flavia Rufi ni 246 Distance-based analysis of variance for brain connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Russell T. Shinohara, Haochang Shou, Marco Carone, Robert Schultz, Birkan Tunc, Drew Parker, Melissa Lynne Martin, and Ragini Verma 257 Improving estimation effi ciency for regression withMNAR covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Menglu Che, Peisong Han, and Jerald F. Lawless 270

2020-02-28 — Machine learning as a strategy to account for dietary synergy: an illustration based on dietary intake and adverse pregnancy outcomes.

Authors: L. Bodnar, Abigail R Cartus, S. Kirkpatrick, K. Himes, Edward H. Kennedy, H. Simhan, W. Grobman, J. Duffy, R. Silver, S. Parry, A. Naimi
Year: 2020
Publication Date: 2020-02-28
Venue: American Journal of Clinical Nutrition
DOI: 10.1093/ajcn/nqaa027
Link: Semantic Scholar
Matched Keywords: super learner, targeted maximum likelihood estimation, tmle

Abstract:
BACKGROUND Conventional analytic approaches for studying diet patterns assume no dietary synergy, which can lead to bias if incorrectly modeled. Machine learning algorithms can overcome these limitations. OBJECTIVES We estimated associations between fruit and vegetable intake relative to total energy intake and adverse pregnancy outcomes using targeted maximum likelihood estimation (TMLE) paired with the ensemble machine learning algorithm Super Learner, and compared these with results generated from multivariable logistic regression. METHODS We used data from 7572 women in the Nulliparous Pregnancy Outcomes Study: monitoring mothers-to-be. Usual daily periconceptional intake of total fruits and total vegetables was estimated from an FFQ. We calculated the marginal risk of preterm birth, small-for-gestational-age (SGA) birth, gestational diabetes, and pre-eclampsia according to density of fruits and vegetables (cups/1000 kcal) ≥80th percentile compared with <80th percentile using multivariable logistic regression and Super Learner with TMLE. Models were adjusted for confounders, including other Healthy Eating Index-2010 components. RESULTS Using logistic regression, higher fruit and high vegetable densities were associated with 1.1% and 1.4% reductions in pre-eclampsia risk compared with lower densities, respectively. They were not associated with the 3 other outcomes. Using Super Learner with TMLE, high fruit and vegetable densities were associated with fewer cases of preterm birth (-4.0; 95% CI: -4.9, -3.0 and -3.7; 95% CI: -5.0, -2.3), SGA (-1.7; 95% CI: -2.9, -0.51 and -3.8; 95% CI: -5.0, -2.5), and pre-eclampsia (-3.2; 95% CI: -4.2, -2.2 and -4.0; 95% CI: -5.2, -2.7) per 100 births, respectively, and high vegetable densities were associated with a 0.9% increase in risk of gestational diabetes. CONCLUSIONS The differences in results between Super Learner with TMLE and logistic regression suggest that dietary synergy, which is accounted for in machine learning, may play a role in pregnancy outcomes. This innovative methodology for analyzing dietary data has the potential to advance the study of diet patterns.

2020-02-22 — Super Learner for Survival Data Prediction

Authors: Marzieh K Golmakani, E. Polley
Year: 2020
Publication Date: 2020-02-22
Venue: The International Journal of Biostatistics
DOI: 10.1515/ijb-2019-0065
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract Survival analysis is a widely used method to establish a connection between a time to event outcome and a set of potential covariates. Accurately predicting the time of an event of interest is of primary importance in survival analysis. Many different algorithms have been proposed for survival prediction. However, for a given prediction problem it is rarely, if ever, possible to know in advance which algorithm will perform the best. In this paper we propose two algorithms for constructing super learners in survival data prediction where the individual algorithms are based on proportional hazards. A super learner is a flexible approach to statistical learning that finds the best weighted ensemble of the individual algorithms. Finding the optimal combination of the individual algorithms through minimizing cross-validated risk controls for over-fitting of the final ensemble learner. Candidate algorithms may range from a basic Cox model to tree-based machine learning algorithms, assuming all candidate algorithms are based on the proportional hazards framework. The ensemble weights are estimated by minimizing the cross-validated negative log partial likelihood. We compare the performance of the proposed super learners with existing models through extensive simulation studies. In all simulation scenarios, the proposed super learners are either the best fit or near the best fit. The performances of the newly proposed algorithms are also demonstrated with clinical data examples.

2020-02-14 — Abstract P5-06-17:Post hotanalysis of EORTC trial: Comedication and its impact on pCR rate

Authors: B. Grandal, Loïc Ferrer, Nadir Sella, E. Laas, C. Poncet, H. Bonnefoi, A. Latouche, E. Brain, F. Reyal, A. Hamy
Year: 2020
Publication Date: 2020-02-14
Venue: Poster Session Abstracts
DOI: 10.1158/1538-7445.SABCS19-P5-06-17
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Context: Breast cancer increases with age, as does the incidence of chronic diseases and medications. There is a growing interest in chronically used medications that may influence the risk of cancer, as well as its progression. However, few studies focus on the influence of comedications on the outcomes of breast cancer (BC) and treatment. In addition, very little evidence is available regarding the impact of comedications on response to treatment in the neoadjuvant setting. Objectives: To assess whether the use of comedications modifies the response to chemotherapy. Methods: We reanalyzed the data from the EORTC 10994/BIG 1-00 phase III trial while focusing on chronic comedications prospectively registered. In this multicenter open-label trial (NCT00017095), 1856 patients with invasive breast cancer were stratified and randomly assigned to receive either a standard anthracycline regimen (FEC) or a taxane-based regimen (T-ET). Response to chemotherapy was assessed by pathological complete response (pCR) rates. We analyzed comedication according to level 1 of the Anatomical Therapeutic Chemical Classification System (ATC), grouping drugs by organ or system on which they act. A chronic comedication was defined by a comedication declared at inclusion and at least twice during follow-up visits. To estimate the average causal effect of comedication on pCR, we employed Inverse Probability Weighting (IPW) and standardization approaches, and we considered the Super Learner strategy to pick the best regression model from a list of candidates. Results: Out of 1856 patients included in the study, 1594 were included in this substudy (arm FEC n=839 (49.7%), arm T-ET, n=848 (50.3%)). The median age at inclusion was 48.5-year. BC subtypes were as follows: luminal BCs (40%), HER2-positive (25%), and TNBCs (14%). Overall, 11.4% of the patients (n=182) had at least one chronic comedication. The repartition of the comedications, according to the 1st level of the classification of the ATC, was as follows: Alimentary tract and metabolism (A): 2.6% (n=42), Cardiovascular system (C): 2% (n=32) and Nervous system (N): 6.6% (n=106). Patients taking drugs targeting cardiovascular system tend to be older with higher BMI and presented more frequently histological grade 3 and tumour status T4. The effect of comedications on pCR rates was different according to the chemotherapy regimen. The use of psychotropics (class N) in the T-ET arm was associated with increased pCR rates (OR = 2,3; 95% CI, 1.6 to 3,5; P =0.04), whereas this finding was not observed in the FEC arm (Pinteraction=0.04). Similarly, drugs targeting the alimentary system were associated with increased pCR rates in the FEC arm (OR = 11,4; 95% CI, 1,45 to 47; P = 0.03) whereas a lower pCR rate was observed in the T-ET arm (OR = 0,2; 95% CI, 0.1 to 0,8; P =0.02). Discussion: In this post hot analysis of EORTC trial, the use of different chronic comedication during neoadjuvant chemotherapy was associated with changes in pCR rates differentially according to the chemotherapy arm. This finding prompts for further research on the interactions between chemotherapy and chronic non-anti-cancer drugs use, to decipher if subgroups of patients may derive different benefits of harms from specific associations. Citation Format: Beatriz Grandal, Loic Ferrer, Nadir Sella, Enora Laas, Coralie Poncet, Herve Bonnefoi, Aurelien Latouche, Etienne Brain, Fabien Reyal, Anne-Sophie Hamy. Post hot analysis of EORTC trial: Comedication and its impact on pCR rate [abstract]. In: Proceedings of the 2019 San Antonio Breast Cancer Symposium; 2019 Dec 10-14; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2020;80(4 Suppl):Abstract nr P5-06-17.

2020-02-07 — The overall effect of parental supply of alcohol across adolescence on alcohol-related harms in early adulthood - a prospective cohort study.

Authors: P. Clare, T. Dobbins, R. Bruno, A. Peacock, Veronica C. Boland, W. S. Yuen, A. Aiken, L. Degenhardt, K. Kypri, T. Slade, D. Hutchinson, J. Najman, N. McBride, J. Horwood, J. Mccambridge, R. Mattick
Year: 2020
Publication Date: 2020-02-07
Venue: Addiction
DOI: 10.1111/add.15005
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
BACKGROUND AND AIMS Recent research suggests parental supply of alcohol is associated with more risky drinking and alcohol-related harm among adolescents. However, the overall effect of parental supply across adolescence remains unclear because parental supply of alcohol varies over adolescence. Due to the complexity of longitudinal data, standard analytic methods can be biased. This study examined the effect of parental supply of alcohol on alcohol-related outcomes in early adulthood using robust methods to minimise risk of bias. DESIGN Prospective longitudinal cohort study. SETTING Australia PARTICIPANTS: Cohort of school students (n=1906) recruited in the first year of secondary school (average age 12.9yrs) from Australian schools in 2010-11, interviewed annually for 7 years. MEASUREMENTS The exposure variable was self-reported parental supply of alcohol (including sips/whole drinks) across five years of adolescence (waves 1-5). Outcome variables were self-reported binge drinking, alcohol-related harm, and symptoms of alcohol use disorder, measured in the two waves after the exposure period (waves 6-7). To reduce risk of bias, we used Targeted Maximum Likelihood Estimation to assess the (counterfactual) effect of parental supply of alcohol in all five waves versus no supply, on alcohol-related outcomes. FINDINGS Parental supply of alcohol across adolescence saw greater risk of binge drinking (RR:1.53; 95% CI:1.27-1.84) and alcohol-related harms (RR:1.44; 95% CI:1.22-1.69) in the year following the exposure period compared with no supply in adolescence. Earlier initiation of parental supply also increased risk of binge drinking (RR:1.10; 95% CI:1.05-1.14), and any alcohol-related harm (RR:1.09; 95% CI:1.05-1.13) for each year earlier parental supply began compared with later (or no) initiation. CONCLUSIONS Adolescents whose parents supply them with alcohol appear to have an increased risk of alcohol-related harm compared with adolescents whose parents do not supply them with alcohol. The risk appears to increase with earlier initiation of supply.

2020-02-06 — Association of Overweight and Obesity Development Between Pregnancies With Stillbirth and Infant Mortality in a Cohort of Multiparous Women.

Authors: Ya-Hui Yu, L. Bodnar, K. Himes, M. Brooks, A. Naimi
Year: 2020
Publication Date: 2020-02-06
Venue: Obstetrics and Gynecology
DOI: 10.1097/AOG.0000000000003677
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
OBJECTIVE To identify the association of newly developed prepregnancy overweight and obesity with stillbirth and infant mortality. METHODS We studied subsequent pregnancies of mothers who were normal weight at fertilization of their first identified pregnancy, from a population-based cohort that linked birth registry with death records in Pennsylvania, 2003-2013. Women with newly developed prepregnancy overweight and obesity were defined as those whose body mass index (BMI) before second pregnancy was between 25 and 29.9 or 30 or higher, respectively. Our main outcomes of interest were stillbirth (intrauterine death at 20 weeks of gestation or greater), infant mortality (less than 365 days after birth), neonatal death (less than 28 days after birth) and postneonatal death (29-365 days after birth). Associations of both prepregnancy BMI categories and continuous BMI with each outcome were estimated by nonparametric targeted minimum loss-based estimation and inverse-probability weighted dose-response curves, respectively, adjusting for race-ethnicity, smoking, and other confounders (eg, age, education). RESULTS A cohort of 212,889 women were included for infant mortality analysis (192,941 women for stillbirth analysis). The crude rate of stillbirth and infant mortality in these final analytic cohorts were 3.3 per 1,000 pregnancies and 2.9 per 1,000 live births, respectively. Compared with women who stayed at a normal weight in their second pregnancies, those becoming overweight had 1.4 (95% CI 0.6-2.1) excess stillbirths per 1,000 pregnancies. Those becoming obese had 3.6 (95% CI 1.3-5.9) excess stillbirths per 1,000 pregnancies and 2.4 (95% CI 0.4-4.4) excess neonatal deaths per 1,000 live births. There was a dose-response relationship between prepregnancy BMI increases of more than 2 units and increased risk of stillbirth and infant mortality. In addition, BMI increases were associated with higher risks of infant mortality among women with shorter interpregnancy intervals (less than 18 months) compared with longer intervals. CONCLUSION Transitioning from normal weight to overweight or obese between pregnancies was associated with an increased risk of stillbirth and neonatal mortality.

2020-01-16 — Super Learner for Predicting Stock Market Trends: A Case Study of Jakarta Islamic Index Stock Exchange

Authors: G. A. Dito, B. Sartono, Annisa Annisa
Year: 2020
Publication Date: 2020-01-16
Venue: Proceedings of the Proceedings of the 1st International Conference on Statistics and Analytics, ICSA 2019, 2-3 August 2019, Bogor, Indonesia
DOI: 10.4108/eai.2-8-2019.2290523
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
. Predicting stock market trend is one of the challenging tasks over the years. It has diverse influencing’s factors which cause stock market trend is very dynamic and has high volatility. Forecasting model, which is a prevalent method to predict the stock market trend, has several difficulties with its characteristics. Although forecasting model is efficient, sometimes it has high forecasting error. Formulating forecasting problem into a classification problem might be considered an alternative approach to predict stock market trend. Several kinds of research have shown that machine learning is a suitable method for predicting stock market trend as a classification problem. This paper discusses applying one of powerful machine learning method, which is called Super learner, to predict stock market trends. Besides, this research employs several technical indicators as predictor variables. Results show that the Super Learner model is useful for predicting both the short-term and long-term trend.

2020 — A Super-Learner Ensemble of Deep Networks for Vehicle-Type Classification

Authors: Mohamed A. Hedeya, A. Eid, Rehab F. Abdel-Kader
Year: 2020
Venue: IEEE Access
DOI: 10.1109/ACCESS.2020.2997286
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Automatic vehicle-type classification plays an imperative role in the development of efficient Intelligent Transportation Systems (ITS). In this paper, a super-learner ensemble is proposed for the vehicle-type classification problem. A densely connected single-split super learner is utilized to exploit the strengths and diminish the weaknesses of the individual base learners ResNet50, Xception, and DenseNet. The super learner aims to learn fusion weights in a data-adaptive manner to obtain the optimal combination of the base learners. The proposed method is simple, robust, and enhances the discrimination capabilities among the similarly-looking classes without requiring any hand-crafted features or logical reasoning. The proposed method is evaluated using two of the most challenging publicly available traffic surveillance datasets: the MIOvision Traffic Camera Dataset (MIO-TCD) and the Beijing Institute of Technology’s (BIT) vehicle classification dataset. Three variants of the super learner ensemble: RXD-CV-CW, RXD-CV-CW-NCW and Augmented-RXD, were examined on the MIO-TCD dataset with variations in applying class weights and data augmentation during training. RXD-CV-CW-NCW and Augmented-RXD share the third place among the published state-of-the-art methods reported in the MIO-TCD classification challenge. Augmented-RXD generalizes to the classes in common between the two datasets without degrading its performance on the MIO-TCD dataset. Both variants achieved an overall accuracy of 97.94%, and a Cohen Kappa score of 96.78%. In addition, the super-learner variants that we trained on the BIT-Vehicle dataset images achieved overall accuracies of up to 97.62%.

2019 (49 papers)
2019-12-23 — Estimating treatment importance in multidrug‐resistant tuberculosis using Targeted Learning: An observational individual patient data network meta‐analysis

Authors: Guanbo Wang, M. Schnitzer, D. Menzies, P. Viiklepp, T. Holtz, A. Benedetti
Year: 2019
Publication Date: 2019-12-23
Venue: Biometrics
DOI: 10.1111/biom.13210
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Persons with multidrug‐resistant tuberculosis (MDR‐TB) have a disease resulting from a strain of tuberculosis (TB) that does not respond to at least isoniazid and rifampicin, the two most effective anti‐TB drugs. MDR‐TB is always treated with multiple antimicrobial agents. Our data consist of individual patient data from 31 international observational studies with varying prescription practices, access to medications, and distributions of antibiotic resistance. In this study, we develop identifiability criteria for the estimation of a global treatment importance metric in the context where not all medications are observed in all studies. With stronger causal assumptions, this treatment importance metric can be interpreted as the effect of adding a medication to the existing treatments. We then use this metric to rank 15 observed antimicrobial agents in terms of their estimated add‐on value. Using the concept of transportability, we propose an implementation of targeted maximum likelihood estimation, a doubly robust and locally efficient plug‐in estimator, to estimate the treatment importance metric. A clustered sandwich estimator is adopted to compute variance estimates and produce confidence intervals. Simulation studies are conducted to assess the performance of our estimator, verify the double robustness property, and assess the appropriateness of the variance estimation approach.

2019-12-13 — Conditional Super Learner

Authors: Gilmer Valdes, Y. Interian, E. Gennatas, Mark J. van der Laan
Year: 2019
Publication Date: 2019-12-13
Venue: arXiv.org
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
In this article we consider the Conditional Super Learner (CSL), an algorithm which selects the best model candidate from a library conditional on the covariates. The CSL expands the idea of using cross-validation to select the best model and merges it with meta learning. Here we propose a specific algorithm that finds a local minimum to the problem posed, proof that it converges at a rate faster than Op(n^-1/4) and offers extensive empirical evidence that it is an excellent candidate to substitute stacking or for the analysis of Hierarchical problems.

2019-12-10 — Super Learner Prediction [R package SuperLearner version 2.0-26]

Authors: E. Polley, E. LeDell, Chris J. Kennedy, M. Laan
Year: 2019
Publication Date: 2019-12-10
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2019-12-03 — Confounding Adjustment Methods for Multi-level Treatment Comparisons Under Lack of Positivity and Unknown Model Specification

Authors: Diop S. Arona, Duchesne Thierry, Cumming Steven, Diop Awa, Talbot Denis
Year: 2019
Publication Date: 2019-12-03
DOI: 10.1080/02664763.2021.1911966
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
the overlap weights, augmented overlap weights, bias-corrected matching and targeted maximum likelihood. A simple variance estimator for the overlap weight estimators that can naturally be combined with machine learning algorithms is proposed. In a simulation study, we investigated the empirical performance of these methods as well as those of simpler alternatives, standardization, inverse probability weighting and matching. Our proposed variance estimator performed well, even at a sample size of 500. Adjustment methods that included an outcome modeling component performed better than those that only modeled the treatment mechanism. Additionally, a machine learning implementation was observed to efficiently compensate for the unknown model specification for the former methods, but not the latter. Based on these results, the wildfire data were analyzed using the augmented overlap weight estimator. With respect to effectiveness of alternate fire-suppression interventions, the results were counter-intuitive, indeed the opposite of what would be expected on subject-matter grounds. This suggests the presence in the data of unmeasured confounding bias. Inc.param=incorrect parametric models, M.Learning=machine learning, Crude=Unadjusted, stan=standardization, IPW=inverse probability weighting, match=matching, BCM=bias-corrected TMLE=targeted OW=overlap weights, A-OW=augmented overlap

2019-12-01 — Real Time Anomaly detection-Based QoE Feature selection and Ensemble Learning for HTTP Video Services.

Authors: T. Abar, Asma BEN LETAIFA, S. E. Asmi
Year: 2019
Publication Date: 2019-12-01
Venue: Information and Communication Technologies and Accessibility
DOI: 10.1109/ICTA49490.2019.9144867
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Using Machine Learning to the user perception prediction has largely increased in the last decade. Although it is difficult to deduce the best category model to address the predicted Quality of Experience QoE in operational networks. Also, sometimes the result obtained by ML is not relevant i.e not expected according to the present conditions. We talk here about problem of anomaly in Machine Learning context. To fill this gap, in this paper, we try to solve the problem of Machine Learning ML anomaly detection in the QoE context for video streaming service in Software Defined Networking SDN environment. We explore in one hand feature selection methodology to extract the most correlated features to the video QoE. In other hand we investigate different ensemble learning approaches to improve the detection of QoE anomalies following the known stacking or super learning model. Results show that the proposed model presents a high performance comparing to other types of ensemble learning and single learning techniques.

2019-11-22 — Far from MCAR

Authors: L. Balzer, J. Ayieko, D. Kwarisiima, G. Chamie, E. Charlebois, Joshua Schwab, M. J. van der Laan, M. Kamya, D. Havlir, M. Petersen
Year: 2019
Publication Date: 2019-11-22
Venue: Epidemiology
DOI: 10.1101/19012781
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Supplemental Digital Content is available in the text. Background: Population-level estimates of disease prevalence and control are needed to assess prevention and treatment strategies. However, available data often suffer from differential missingness. For example, population-level HIV viral suppression is the proportion of all HIV-positive persons with suppressed viral replication. Individuals with measured HIV status, and among HIV-positive individuals those with measured viral suppression, likely differ from those without such measurements. Methods: We discuss three sets of assumptions to identify population-level suppression in the intervention arm of the SEARCH Study (NCT01864603), a community randomized trial in rural Kenya and Uganda (2013–2017). Using data on nearly 100,000 participants, we compare estimates from (1) an unadjusted approach assuming data are missing-completely-at-random (MCAR); (2) stratification on age group, sex, and community; and (3) targeted maximum likelihood estimation to adjust for a larger set of baseline and time-updated variables. Results: Despite high measurement coverage, estimates of population-level viral suppression varied by identification assumption. Unadjusted estimates were most optimistic: 50% (95% confidence interval [CI] = 46%, 54%) of HIV-positive persons suppressed at baseline, 80% (95% CI = 78%, 82%) at year 1, 85% (95% CI = 83%, 86%) at year 2, and 85% (95% CI = 83%, 87%) at year 3. Stratifying on baseline predictors yielded slightly lower estimates, and full adjustment reduced estimates meaningfully: 42% (95% CI = 37%, 46%) of HIV-positive persons suppressed at baseline, 71% (95% CI = 69%, 73%) at year 1, 76% (95% CI = 74%, 78%) at year 2, and 79% (95% CI = 77%, 81%) at year 3. Conclusions: Estimation of population-level disease burden and control requires appropriate adjustment for missing data. Even in large studies with limited missingness, estimates relying on the MCAR assumption or baseline stratification should be interpreted cautiously.

2019-11-19 — Machine learning in the estimation of causal effects: targeted minimum loss-based estimation and double/debiased machine learning.

Authors: I. Díaz
Year: 2019
Publication Date: 2019-11-19
Venue: Biostatistics
DOI: 10.1093/biostatistics/kxz042
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation, tmle

Abstract:
In recent decades, the fields of statistical and machine learning have seen a revolution in the development of data-adaptive regression methods that have optimal performance under flexible, sometimes minimal, assumptions on the true regression functions. These developments have impacted all areas of applied and theoretical statistics and have allowed data analysts to avoid the biases incurred under the pervasive practice of parametric model misspecification. In this commentary, I discuss issues around the use of data-adaptive regression in estimation of causal inference parameters. To ground ideas, I focus on two estimation approaches with roots in semi-parametric estimation theory: targeted minimum loss-based estimation (TMLE; van der Laan and Rubin, 2006) and double/debiased machine learning (DML; Chernozhukov and others, 2018). This commentary is not comprehensive, the literature on these topics is rich, and there are many subtleties and developments which I do not address. These two frameworks represent only a small fraction of an increasingly large number of methods for causal inference using machine learning. To my knowledge, they are the only methods grounded in statistical semi-parametric theory that also allow unrestricted use of data-adaptive regression techniques.

2019-11-15 — One‐step targeted maximum likelihood estimation for time‐to‐event outcomes

Authors: Weixin Cai, M. J. van der Laan
Year: 2019
Publication Date: 2019-11-15
Venue: Biometrics
DOI: 10.1111/biom.13172
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Researchers in observational survival analysis are interested in not only estimating survival curve nonparametrically but also having statistical inference for the parameter. We consider right‐censored failure time data where we observe n independent and identically distributed observations of a vector random variable consisting of baseline covariates, a binary treatment at baseline, a survival time subject to right censoring, and the censoring indicator. We assume the baseline covariates are allowed to affect the treatment and censoring so that an estimator that ignores covariate information would be inconsistent. The goal is to use these data to estimate the counterfactual average survival curve of the population if all subjects are assigned the same treatment at baseline. Existing observational survival analysis methods do not result in monotone survival curve estimators, which is undesirable and may lose efficiency by not constraining the shape of the estimator using the prior knowledge of the estimand. In this paper, we present a one‐step Targeted Maximum Likelihood Estimator (TMLE) for estimating the counterfactual average survival curve. We show that this new TMLE can be executed via recursion in small local updates. We demonstrate the finite sample performance of this one‐step TMLE in simulations and an application to a monoclonal gammopathy data.

2019-11-13 — High Accuracy, Low-Cost Transcriptional Diagnostic to Transform Lymphoma Care in Low- and Middle-Income Countries

Authors: Edward L Briercheck, F. Valvert, E. Solorzano, Oscar Silva, M. Puligandla, Marcos Mauricio Siliézar Tala, Timothy Guyon, Samuel L. Dixon, R. Terbrueggen, Y. Natkunam, K. Stevenson, D. Weinstock
Year: 2019
Publication Date: 2019-11-13
Venue: Blood
DOI: 10.1182/blood-2019-121397
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Introduction: The majority of people worldwide lack access to high accuracy diagnostics to guide lymphoma therapy. As a consequence, many patients receive incorrect or no treatment. We hypothesized that a low-cost, parsimonious gene expression assay using FFPE biopsies from low-income settings could distinguish multiple lymphoma subtypes. Accurate diagnoses would make it possible to extend high therapeutic index agents currently available within high-income countries to underserved patients around the world. Methods: We reviewed 900 patient cases from INCAN, the public cancer hospital in Guatemala City, for which a biopsy was obtained between 2006-2018 due to the clinician's suspicion for lymphoma. Whole-slide sections were assessed by H&E and then involved areas were embedded into tissue microarrays and analyzed at Stanford University according to the 2016 WHO classification (>35,000 individual IHC and FISH assessments). Consensus diagnosis for each case was made by two expert hematopathologists (YN and OS). Diagnoses were then binned into: high grade B-cell lymphoma (BCL), diffuse large B-cell lymphoma (DLBCL), follicular lymphoma (FL), Hodgkin's lymphoma (HL), mantle cell lymphoma (MCL), marginal zone lymphoma (MZL), NK/T-cell lymphoma (NKTCL), T-cell lymphoma (TCL) or non-lymphoma (NL). The latter included lymphoid hyperplasia, granulomatous inflammation, chronic inflammation, reactive follicular hyperplasia, angiomatous hamartoma, chronic gastritis, dermatitis, normal skin and tonsil. DLBCL cell-of-origin (COO) was classified based on Hans algorithm. We selected 37 genes based on studies that defined subtype-specific expression, DLBCL COO, or therapeutic relevance. We established a chemical ligation-based probe amplification (CLPA, DxTerity Diagnostics) assay that quantifies expression of the 37 genes (plus 2 normalizer genes) by routine capillary electrophoresis at a cost of <$10/sample. Candidate models were trained on data from the diagnostic samples using 10-fold cross validation with 5 repeats using the Classification And REgression Training (caret) package in R v.3.5.1. The data were split 70/30 into training and validation sets. A 2-staged classification approach was used to determine the sample's class label. Fourteen models were used as "base learners" and the class probabilities from each model were then used as predictors in a random forest ("super learner") to assign the class label with the highest probability value. An additional two-class model was developed using analogous methodology to classify samples called DLBCL in the first stage as non-GCB or GCB. Results: The 900 patient cases included >50 different malignant and non-malignant disorders. We selected 648 cases for gene expression analysis to ensure adequate statistical representation of major lymphoma types for model building and validation. FFPE scrolls were utilized for CLPA, of which 59 (9.1%) failed quality control and 38 were from patients with relapsed disease. Assay turnaround time was <7 days. 551 diagnostic samples were divided into 70% (n=391) training and 30% (n=160) validation cohorts (Table). Overall accuracy for the validation cohort was 88.8% [95% CI; 82.8, 93.2], with >90% accuracy for DLBCL, HL, MCL, NKTCL and NL (Table). Among cases diagnosed as BCL by standard IHC/FISH but DLBCL by CLPA, 3 of 4 had Ki67<50%, suggesting biology more similar to DLBCL. 6 of 7 misclassified MZL and FL cases were classified by CLPA as DLBCL, raising the possibility that small areas of transformed disease may have been present within the biopsies. Accuracy for DLBCL COO classification from the validation cohort compared to Hans algorithm staining (n=59) was 89.8% [95% CI; 79.2-96.1]. Accuracy for relapse samples as a test cohort (n=38) was >90% for DLBCL, FL, HL, MCL and NKTCL. Summary: Classification of biopsies into biologically- and therapeutically-relevant bins is feasible based on parsimonious, gene expression-based, statistical-learning algorithms. Importantly, this approach has high accuracy both for lymphoma subtypes and for non-lymphoma diagnoses. The assay is highly cost-effective, rapid, uses basic clinical laboratory equipment, and is now being performed on site at INCAN. In summary, a CLPA-based transcriptional assay could have broad utility across the globe for diagnosis, subtyping, COO classification and assessment of therapeutic targets within FFPE biopsies. Guyon: Dexterity Diagnostics: Employment. Dixon:Dexterity Diagnostics: Employment. Terbrueggen:Dexterity Diagnostics: Employment, Equity Ownership. Stevenson:Celgene: Research Funding. Weinstock:Verastem Oncology: Research Funding; Celgene: Research Funding.

2019-11-13 — Ensemble modelling in descriptive epidemiology: burden of disease estimation.

Authors: Marlena S Bannick, Madeline McGaughey, A. Flaxman
Year: 2019
Publication Date: 2019-11-13
Venue: International Journal of Epidemiology
DOI: 10.1093/ije/dyz223
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Ensemble modelling is a quantitative method that combines information from multiple individual models and has shown great promise in statistical machine learning. Ensemble models have a theoretical claim to being models that make the 'best' predictions possible. Applications of ensemble models to health research have included applying ensemble models like the super learner and random forests to epidemiological prediction tasks. Recently, ensemble methods have been applied successfully in burden of disease estimation. This article aims to provide epidemiologists with a practical understanding of the mechanisms of an ensemble model and insight into constructing ensemble models that are grounded in the epidemiological dynamics of the prediction problem of interest. We summarize the history of ensemble models, present a user-friendly framework for conceptualizing and constructing ensemble models, walk the reader through a tutorial of applying the framework to an application in burden of disease estimation, and discuss further applications.

2019-11-12 — Clinical Outcomes among Patients with Drug-resistant Tuberculosis receiving Bedaquiline or Delamanid Containing Regimens.

Authors: R. Kempker, L. Mikiashvili, Y. Zhao, D. Benkeser, Ketevan Barbakadze, N. Bablishvili, Z. Avaliani, C. Peloquin, Henry M. Blumberg, M. Kipiani
Year: 2019
Publication Date: 2019-11-12
Venue: Clinical Infectious Diseases
DOI: 10.1093/cid/ciz1107
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
BACKGROUND Bedaquiline and delamanid are newly available drugs for treating multidrug-resistant tuberculosis (MDR TB); however, there is limited data guiding their use and no comparison studies. METHODS We conducted a prospective observational study among patients with MDR TB in Georgia receiving a bedaquiline or delamanid-based treatment regimen. Monthly sputum cultures, minimal inhibitory concentration testing, and adverse event monitoring were performed. Primary outcomes were culture conversion rates and clinical outcomes. Targeted maximum likelihood estimation (TMLE) and superlearning were utilized to produce a covariate-adjusted proportion of outcomes for each regimen. RESULTS Among 156 patients with MDR TB, 100 were enrolled and 95 were receiving a bedaquiline (n=64) or delamanid (n=31) based regimen. Most were male (82%) and the median age was 38 years. Rates of previous treatment (56%) and cavitary disease (61%) were high. The most common companion drugs included linezolid, clofazimine, cycloserine and a fluoroquinolone. Median effective drugs received among patients on bedaquiline (4, IQR 4-4) and delamanid (4, IQR 3.5-5) based regimens were similar. Rates of acquired drug resistance were significantly higher among patients receiving delamanid versus bedaquiline (36% vs. 10%, p <0.01). Adjusted rates of sputum culture conversion at two months (67 vs. 47%, p=0.10) and six months (95 vs. 74%, p<0.01) and favorable clinical outcomes (96 vs. 72%, p<0.01) were higher among patients receiving bedaquiline versus delamanid. CONCLUSIONS Among patients with MDR TB, bedaquiline-based regimens were associated with higher rates of sputum culture conversion and favorable outcomes and a lower rate of acquired drug resistance versus delamanid-based regimens.

2019-11-12 — Associations between alcohol use and HIV care cascade outcomes among adults undergoing population-based HIV testing in East Africa.

Authors: Sarah B Puryear, L. Balzer, J. Ayieko, D. Kwarisiima, J. Hahn, E. Charlebois, T. Clark, C. Cohen, E. Bukusi, M. Kamya, M. Petersen, D. Havlir, G. Chamie
Year: 2019
Publication Date: 2019-11-12
Venue: AIDS (London)
DOI: 10.1097/QAD.0000000000002427
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
OBJECTIVE To assess the impact of alcohol use on HIV care cascade outcomes. DESIGN Cross-sectional analyses. METHODS We evaluated HIV care cascade outcomes and alcohol use in adults (≥15 years) during baseline (2013--2014) population-based HIV testing in 28 Kenyan and Ugandan communities. 'Alcohol use' included any current use and was stratified by Alcohol Use Disorders Identification Test-Concise (AUDIT-C) scores: nonhazardous/low (1--3 men/1--2 women), hazardous/medium (4--5 men/3--5 women), hazardous/high (6--7), hazardous/very-high (8--12). We estimated cascade outcomes and relative risks associated with each drinking level using targeted maximum likelihood estimation, adjusting for confounding and missing measures. RESULTS Among 118 923 adults, 10 268 (9%) tested HIV-positive. Of those, 10 067 (98%) completed alcohol screening: 1626 (16%) reported drinking, representing 7% of women (467/6499) and 33% of men (1 159/3568). Drinking levels were: low (48%), medium (34%), high (11%), very high (7%). Drinkers were less likely to be previously HIV diagnosed (58% [95% CI: 55--61%]) than nondrinkers [66% (95% CI: 65-67%); RR: 0.87 (95% CI: 0.83-0.92)]. If previously diagnosed, drinkers were less likely to be on ART [77% (95% CI: 73-80%)] than nondrinkers [83% (95% CI 82-84%); RR: 0.93 (95% CI: 0.89-0.97)]. If on ART, there was no association between alcohol use and viral suppression; however, very-high-level users were less likely to be suppressed [RR: 0.80 (95% CI: 0.68-0.94)] versus nondrinkers. On a population level, viral suppression was 38% (95% CI: 36-41%) among drinkers and 44% (95% CI: 43-45%) among nondrinkers [RR: 0.87 (95% CI 0.82-0.94)], an association seen at all drinking levels. CONCLUSION Alcohol use was associated with lower viral suppression; this may be because of decreased HIV diagnosis and ART use.

2019-11-07 — Machine learning to identify persons at high-risk of HIV acquisition in rural Kenya and Uganda.

Authors: L. Balzer, D. Havlir, M. Kamya, G. Chamie, E. Charlebois, T. Clark, Catherine A Koss, D. Kwarisiima, J. Ayieko, N. Sang, J. Kabami, Mucunguzi Atukunda, V. Jain, C. Camlin, C. Cohen, E. Bukusi, M. J. van der Laan, M. Petersen
Year: 2019
Publication Date: 2019-11-07
Venue: Clinical Infectious Diseases
DOI: 10.1093/cid/ciz1096
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
BACKGROUND In generalized epidemic settings, strategies are needed to prioritize individuals at higher risk of HIV acquisition for prevention services such as pre-exposure prophylaxis. We used population-level HIV testing data from rural Kenya and Uganda to construct HIV risk scores and assessed their ability to identify seroconversions. METHODS Between 2013-2017, >75% of residents in 16 communities in the SEARCH Study tested annually for HIV. In this population, we evaluated three strategies for using demographic factors to predict the one-year risk of HIV seroconversion: (1) membership in ≥1 known "Risk Group" (e.g., young woman or HIV-infected spouse); (2) a "Model-based" risk score constructed with logistic regression; (3) a "Machine Learning" risk score constructed with the Super Learner algorithm. We hypothesized Machine Learning would identify high-risk individuals more efficiently (fewer persons targeted for a fixed sensitivity) and with higher sensitivity (for a fixed number of persons targeted) than either other approach. RESULTS 75,558 HIV-negative persons contributed 166,723 person-years of follow-up; 519 seroconverted. Machine Learning improved efficiency; to achieve a fixed sensitivity of 50%, the Risk Group strategy targeted 42% of the population, Model-based 27%, and Machine Learning 18%. Machine Learning also improved sensitivity; with an upper limit of 45% targeted, the Risk Group strategy correctly classified 58% of seroconversions, Model-based 68%, and Machine Learning 78%. CONCLUSIONS Machine learning improved classification of individuals at risk of HIV acquisition compared to a model-based approach or reliance on known risk groups, and could inform targeting of prevention strategies in generalized epidemic settings.

2019-11-06 — Data‐adaptive longitudinal model selection in causal inference with collaborative targeted minimum loss‐based estimation

Authors: M. Schnitzer, Joel Sango, Steve Ferreira Guerra, M. J. van der Laan
Year: 2019
Publication Date: 2019-11-06
Venue: Biometrics
DOI: 10.1111/biom.13135
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Causal inference methods have been developed for longitudinal observational study designs where confounding is thought to occur over time. In particular, one may estimate and contrast the population mean counterfactual outcome under specific exposure patterns. In such contexts, confounders of the longitudinal treatment‐outcome association are generally identified using domain‐specific knowledge. However, this may leave an analyst with a large set of potential confounders that may hinder estimation. Previous approaches to data‐adaptive model selection for this type of causal parameter were limited to the single time‐point setting. We develop a longitudinal extension of a collaborative targeted minimum loss‐based estimation (C‐TMLE) algorithm that can be applied to perform variable selection in the models for the probability of treatment with the goal of improving the estimation of the population mean counterfactual outcome under a fixed exposure pattern. We investigate the properties of this method through a simulation study, comparing it to G‐Computation and inverse probability of treatment weighting. We then apply the method in a real‐data example to evaluate the safety of trimester‐specific exposure to inhaled corticosteroids during pregnancy in women with mild asthma. The data for this study were obtained from the linkage of electronic health databases in the province of Quebec, Canada. The C‐TMLE covariate selection approach allowed for a reduction of the set of potential confounders, which included baseline and longitudinal variables.

2019-10-23 — Hybridizing Machine Learning Methods and Finite Mixture Models for Estimating Heterogeneous Treatment Effects in Latent Classes

Authors: Youmi Suk, Jee-Seon Kim, Hyunseung Kang
Year: 2019
Publication Date: 2019-10-23
Venue: Journal of educational and behavioral statistics
DOI: 10.3102/1076998620951983
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
There has been increasing interest in exploring heterogeneous treatment effects using machine learning (ML) methods such as causal forests, Bayesian additive regression trees, and targeted maximum likelihood estimation. However, there is little work on applying these methods to estimate treatment effects in latent classes defined by well-established finite mixture/latent class models. This article proposes a hybrid method, a combination of finite mixture modeling and ML methods from causal inference to discover effect heterogeneity in latent classes. Our simulation study reveals that hybrid ML methods produced more precise and accurate estimates of treatment effects in latent classes. We also use hybrid ML methods to estimate the differential effects of private lessons across latent classes from Trends in International Mathematics and Science Study data.

2019-10-10 — Estimating treatment effects with machine learning.

Authors: K. McConnell, S. Lindner, Health Services, Research M c cONNELL
Year: 2019
Publication Date: 2019-10-10
Venue: Health Services Research
DOI: 10.1111/1475-6773.13212
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
OBJECTIVE To demonstrate the performance of methodologies that include machine learning (ML) algorithms to estimate average treatment effects under the assumption of exogeneity (selection on observables). DATA SOURCES Simulated data and observational data on hospitalized adults. STUDY DESIGN We assessed the performance of several ML-based estimators, including Targeted Maximum Likelihood Estimation, Bayesian Additive Regression Trees, Causal Random Forests, Double Machine Learning, and Bayesian Causal Forests, applying these methods to simulated data as well as data on the effects of right heart catheterization. PRINCIPAL FINDINGS In Monte Carlo studies, ML-based estimators generated estimates with smaller bias than traditional regression approaches, demonstrating substantial (69 percent-98 percent) bias reduction in some scenarios. Bayesian Causal Forests and Double Machine Learning were top performers, although all were sensitive to high dimensional (>150) sets of covariates. CONCLUSIONS ML-based methods are promising methods for estimating treatment effects, allowing for the inclusion of many covariates and automating the search for nonlinearities and interactions among variables. We provide guidance and sample code for researchers interested in implementing these tools in their own empirical work.

2019-10-02 — Exercise During the First Trimester and Infant Size at Birth: Targeted Maximum Likelihood Estimation of the Causal Risk Difference.

Authors: S. Ehrlich, R. Neugebauer, Juanran Feng, M. Hedderson, A. Ferrara
Year: 2019
Publication Date: 2019-10-02
Venue: American Journal of Epidemiology
DOI: 10.1093/aje/kwz213
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
This cohort study sought to estimate the differences in risk of delivering small and large for gestational age infants (SGA and LGA, respectively) for exercise during the first trimester of pregnancy (versus no exercise) among 2,286 women receiving care at Kaiser Permanente Northern California in 2013-2017. Exercise was assessed by questionnaire. SGA and LGA were determined by the sex and gestational age specific birthweight distributions of the 2017 U.S. Natality file. Risk differences were estimated by targeted maximum likelihood estimation, with and without data-adaptive prediction (machine learning). Analyses were also stratified by prepregnancy weight status. Overall, exercise at the cohort-specific 75th percentile was associated with an increased risk of SGA of 4.5 (95% CI 2.1, 6.8) per 100 births, and decreased risk of LGA of 2.8 (95% CI 0.5, 5.1) per 100 births; similar findings were observed among the underweight and normal weight women but no associations were found among those with overweight or obesity. Meeting Physical Activity Guidelines was associated with increased risk of SGA and decreased risk of LGA, but only among underweight and normal weight women. Any vigorous exercise reduced the risk LGA in underweight and normal weight women only, and was not associated with SGA risk.

2019-10-01 — SUPER Learning: A Supervised-Unsupervised Framework for Low-Dose CT Image Reconstruction

Authors: Zhipeng Li, Siqi Ye, Y. Long, S. Ravishankar
Year: 2019
Publication Date: 2019-10-01
Venue: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)
DOI: 10.1109/ICCVW.2019.00490
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Recent years have witnessed growing interest in machine learning-based models and techniques for low-dose X-ray CT (LDCT) imaging tasks. The methods can typically be categorized into supervised learning methods and unsupervised or model-based learning methods. Supervised learning methods have recently shown success in image restoration tasks. However, they often rely on large training sets. Model-based learning methods such as dictionary or transform learning do not require large or paired training sets and often have good generalization properties, since they learn general properties of CT image sets. Recent works have shown the promising reconstruction performance of methods such as PWLS-ULTRA that rely on clustering the underlying (reconstructed) image patches into a learned union of transforms. In this paper, we propose a new Supervised-UnsuPERvised (SUPER) reconstruction framework for LDCT image reconstruction that combines the benefits of supervised learning methods and (unsupervised) transform learning-based methods such as PWLS-ULTRA that involve highly image-adaptive clustering. The SUPER model consists of several layers, each of which includes a deep network learned in a supervised manner and an unsupervised iterative method that involves image-adaptive components. The SUPER reconstruction algorithms are learned in a greedy manner from training data. The proposed SUPER learning methods dramatically outperform both the constituent supervised learning-based networks and iterative algorithms for LDCT, and use much fewer iterations in the iterative reconstruction modules.

2019-10-01 — Diet as a Source of Exposure to Environmental Contaminants for Pregnant Women and Children from Six European Countries

Authors: E. Papadopoulou, L. Haug, A. Sakhi, S. Andrušaitytė, X. Basagaña, A. Brantsaeter, M. Casas, S. Fernández-Barrés, R. Gražulevičienė, H. Knutsen, L. Maitre, H. Meltzer, R. McEachan, T. Roumeliotaki, R. Slama, M. Vafeiadi, John Wright, M. Vrijheid, C. Thomsen, L. Chatzi
Year: 2019
Publication Date: 2019-10-01
Venue: Environmental Health Perspectives
DOI: 10.1289/ehp5324
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Background: Pregnant women and children are especially vulnerable to exposures to food contaminants, and a balanced diet during these periods is critical for optimal nutritional status. Objectives: Our objective was to study the association between diet and measured blood and urinary levels of environmental contaminants in mother–child pairs from six European birth cohorts (n=818 mothers and 1,288 children). Methods: We assessed the consumption of seven food groups and the blood levels of organochlorine pesticides, polybrominated diphenyl ethers, polychlorinated biphenyls (PCBs), per- and polyfluoroalkyl substances (PFAS), and heavy metals and urinary levels of phthalate metabolites, phenolic compounds, and organophosphate pesticide (OP) metabolites. Organic food consumption during childhood was also studied. We applied multivariable linear regressions and targeted maximum likelihood based estimation (TMLE). Results: Maternal high (≥4 times/week) versus low (<2 times/week) fish consumption was associated with 15% higher PCBs [geometric mean (GM) ratio=1.15; 95% confidence interval (CI): 1.02, 1.29], 42% higher perfluoroundecanoate (PFUnDA) (GM ratio=1.42; 95% CI: 1.20, 1.68), 89% higher mercury (Hg) (GM ratio=1.89; 95% CI: 1.47, 2.41) and a 487% increase in arsenic (As) (GM ratio=4.87; 95% CI: 2.57, 9.23) levels. In children, high (≥3 times/week) versus low (<1.5 times/week) fish consumption was associated with 23% higher perfluorononanoate (PFNA) (GM ratio=1.23; 95% CI: 1.08, 1.40), 36% higher PFUnDA (GM ratio=1.36; 95% CI: 1.12, 1.64), 37% higher perfluorooctane sulfonate (PFOS) (GM ratio=1.37; 95% CI: 1.22, 1.54), and >200% higher Hg and As [GM ratio=3.87 (95% CI: 1.91, 4.31) and GM ratio=2.68 (95% CI: 2.23, 3.21)] concentrations. Using TMLE analysis, we estimated that fish consumption within the recommended 2–3 times/week resulted in lower PFAS, Hg, and As compared with higher consumption. Fruit consumption was positively associated with OP metabolites. Organic food consumption was negatively associated with OP metabolites. Discussion: Fish consumption is related to higher PFAS, Hg, and As exposures. In addition, fruit consumption is a source of exposure to OPs. https://doi.org/10.1289/EHP5324

2019-09-27 — A Data-Adaptive Targeted Learning Approach of Evaluating Viscoelastic Assay Driven Trauma Treatment Protocols

Authors: Linqing Wei, Lucy Z.Kornblith, A. Hubbard
Year: 2019
Publication Date: 2019-09-27
Link: Semantic Scholar
Matched Keywords: super learning, tmle

Abstract:
Estimating the impact of trauma treatment protocols is complicated by the high dimensional yet finite sample nature of trauma data collected from observational studies. Viscoelastic assays are highly predictive measures of hemostasis. However, the effectiveness of thromboelastography(TEG) based treatment protocols has not been statistically this http URL conduct robust and reliable estimation with sparse data, we built an estimation "machine" for estimating causal impacts of candidate variables using the collaborative targeted maximum loss-based estimation(CTMLE) framework.The computational efficiency is achieved by using the scalable version of CTMLE such that the covariates are pre-ordered by summary statistics of their importance before proceeding to the estimation this http URL extend the application of the estimator in practice, we used super learning in combination with CTMLE to flexibly choose the best convex combination of algorithms. By selecting the optimal covariates set in high dimension and reducing constraints in choosing pre-ordering algorithms, we are able to construct a robust and data-adaptive model to estimate the parameter of interest.Under this estimation framework, CTMLE outperformed the other doubly robust estimators(IPW,AIPW,stabilized IPW,TMLE) in the simulation study. CTMLE demonstrated very accurate estimation of the target parameter (ATE). Applying CTMLE on the real trauma data, the treatment protocol (using TEG values immediately after injury) showed significant improvement in trauma patient hemostasis status (control of bleeding), and a decrease in mortality rate at 6h compared to standard care.The estimation results did not show significant change in mortality rate at 24h after arrival.

2019-09-09 — Super learning for daily streamflow forecasting: Large-scale demonstration and comparison with multiple machine learning algorithms

Authors: Hristos Tyralis, Georgia Papacharalampous, A. Langousis
Year: 2019
Publication Date: 2019-09-09
Venue: arXiv.org
Link: Semantic Scholar
Matched Keywords: super learner, super learning

Abstract:
Daily streamflow forecasting through data-driven approaches is traditionally performed using a single machine learning algorithm. Existing applications are mostly restricted to examination of few case studies, not allowing accurate assessment of the predictive performance of the algorithms involved. Here we propose super learning (a type of ensemble learning) by combining 10 machine learning algorithms. We apply the proposed algorithm in one-step ahead forecasting mode. For the application, we exploit a big dataset consisting of 10-year long time series of daily streamflow, precipitation and temperature from 511 basins. The super learner improves over the performance of the linear regression algorithm by 20.06%, outperforming the "hard to beat in practice" equal weight combiner. The latter improves over the performance of the linear regression algorithm by 19.21%. The best performing individual machine learning algorithm is neural networks, which improves over the performance of the linear regression algorithm by 16.73%, followed by extremely randomized trees (16.40%), XGBoost (15.92%), loess (15.36%), random forests (12.75%), polyMARS (12.36%), MARS (4.74%), lasso (0.11%) and support vector regression (-0.45%). Based on the obtained large-scale results, we propose super learning for daily streamflow forecasting.

2019-09-02 — Um Mecanismo de Aprendizado Incremental para Detecção e Bloqueio de Mineração de Criptomoedas em Redes Definidas por Software

Authors: Helio C. Neto, M. Lopez, N. Fernandes, Diogo M. F. Mattos
Year: 2019
Publication Date: 2019-09-02
Venue: Anais do XIX Simpósio Brasileiro de Segurança da Informação e de Sistemas Computacionais (SBSeg 2019)
DOI: 10.5753/sbseg.2019.13984
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
A mineração não autorizada de criptomoedas implica o uso de valiosos recursos de computação e o alto consumo de energia. Este artigo propõe o mecanismo MineCap, um mecanismo dinâmico e em linha para detectar e bloquear fluxos de mineração não autorizada de criptomoedas, usando o aprendizado de máquina em redes definidas por software. O MineCap desenvolve a técnica de super aprendizado incremental, uma variante do super learner aplicada ao aprendizado incremental. O super aprendizado incremental proporciona ao MineCap precisão para classificar os fluxos de mineração ao passo que o mecanismo aprende com dados recebidos. Os resultados revelam que o mecanismo alcança 98% de acurácia, 99% de precisão, 97% de sensibilidade e 99,9% de especificidade e evita problemas relacionados ao desvio de conceito.

2019-09-01 — Nowcasting and forecasting US recessions: Evidence from the Super Learner

Authors: Benedikt Maas
Year: 2019
Publication Date: 2019-09-01
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2019-08-14 — Efficient estimation of pathwise differentiable target parameters with the undersmoothed highly adaptive lasso

Authors: M. J. van der Laan, David C. Benkeser, Weixin Cai
Year: 2019
Publication Date: 2019-08-14
Venue: The International Journal of Biostatistics
DOI: 10.1515/ijb-2019-0092
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso

Abstract:
Abstract We consider estimation of a functional parameter of a realistically modeled data distribution based on observing independent and identically distributed observations. The highly adaptive lasso estimator of the functional parameter is defined as the minimizer of the empirical risk over a class of cadlag functions with finite sectional variation norm, where the functional parameter is parametrized in terms of such a class of functions. In this article we establish that this HAL estimator yields an asymptotically efficient estimator of any smooth feature of the functional parameter under a global undersmoothing condition. It is formally shown that the L 1-restriction in HAL does not obstruct it from solving the score equations along paths that do not enforce this condition. Therefore, from an asymptotic point of view, the only reason for undersmoothing is that the true target function might not be complex so that the HAL-fit leaves out key basis functions that are needed to span the desired efficient influence curve of the smooth target parameter. Nonetheless, in practice undersmoothing appears to be beneficial and a simple targeted method is proposed and practically verified to perform well. We demonstrate our general result HAL-estimator of a treatment-specific mean and of the integrated square density. We also present simulations for these two examples confirming the theory.

2019-08-10 — Optimizing bug prediction in software testing using Super Learner

Authors: P. Nair
Year: 2019
Publication Date: 2019-08-10
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2019-08-01 — Principled Machine Learning Using the Super Learner: An Application to Predicting Prison Violence

Authors: V. Baćak, Edward H. Kennedy
Year: 2019
Publication Date: 2019-08-01
DOI: 10.1177/0049124117747301
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2019-07-11 — State-Level and County-Level Estimates of Health Care Costs Associated with Food Insecurity

Authors: Seth A. Berkowitz, S. Basu, Craig Gundersen, H. Seligman
Year: 2019
Publication Date: 2019-07-11
Venue: Preventing Chronic Disease
DOI: 10.5888/pcd16.180549
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Introduction Food insecurity, or uncertain access to food because of limited financial resources, is associated with higher health care expenditures. However, both food insecurity prevalence and health care spending vary widely in the United States. To inform public policy, we estimated state-level and county-level health care expenditures associated with food insecurity. Methods We used linked 2011–2013 National Health Interview Survey/Medical Expenditure Panel Survey data (NHIS/MEPS) data to estimate average health care costs associated with food insecurity, Map the Meal Gap data to estimate state-level and county-level food insecurity prevalence (current though 2016), and Dartmouth Atlas of Health Care data to account for local variation in health care prices and intensity of use. We used targeted maximum likelihood estimation to estimate health care costs associated with food insecurity, separately for adults and children, adjusting for sociodemographic characteristics. Results Among NHIS/MEPS participants, 10,054 adults and 3,871 children met inclusion criteria. Model estimates indicated that food insecure adults had annual health care expenditures that were $1,834 (95% confidence interval [CI], $1,073–$2,595, P < .001) higher than food secure adults. For children, estimates were $80 higher, but this finding was not significant (95% CI, −$171 to $329, P = .53). The median annual health care cost associated with food insecurity was $687,041,000 (25th percentile, $239,675,000; 75th percentile, $1,140,291,000). The median annual county-level health care cost associated with food insecurity was $4,433,000 (25th percentile, $1,774,000; 75th percentile, $11,267,000). Cost variability was related primarily to food insecurity prevalence. Conclusions Health care expenditures associated with food insecurity vary substantially across states and counties. Food insecurity policies may be important mechanisms to contain health care expenditures.

2019-07-11 — Reflection on modern methods: when worlds collide-prediction, machine learning and causal inference.

Authors: T. Blakely, J. Lynch, Koen Simons, R. Bentley, Sherri Rose
Year: 2019
Publication Date: 2019-07-11
Venue: International Journal of Epidemiology
DOI: 10.1093/ije/dyz132
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Causal inference requires theory and prior knowledge to structure analyses, and is not usually thought of as an arena for the application of prediction modelling. However, contemporary causal inference methods, premised on counterfactual or potential outcomes approaches, often include processing steps before the final estimation step. The purposes of this paper are: (i) to overview the recent emergence of prediction underpinning steps in contemporary causal inference methods as a useful perspective on contemporary causal inference methods, and (ii) explore the role of machine learning (as one approach to 'best prediction') in causal inference. Causal inference methods covered include propensity scores, inverse probability of treatment weights (IPTWs), G computation and targeted maximum likelihood estimation (TMLE). Machine learning has been used more for propensity scores and TMLE, and there is potential for increased use in G computation and estimation of IPTWs.

2019-07-10 — A Highly Efficient Distributed Deep Learning System For Automatic Speech Recognition

Authors: Wei Zhang, Xiaodong Cui, Ulrich Finkler, G. Saon, Abdullah Kayi, A. Buyuktosunoglu, Brian Kingsbury, David S. Kung, M. Picheny
Year: 2019
Publication Date: 2019-07-10
Venue: Interspeech
DOI: 10.21437/interspeech.2019-2700
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Modern Automatic Speech Recognition (ASR) systems rely on distributed deep learning to for quick training completion. To enable efficient distributed training, it is imperative that the training algorithms can converge with a large mini-batch size. In this work, we discovered that Asynchronous Decentralized Parallel Stochastic Gradient Descent (ADPSGD) can work with much larger batch size than commonly used Synchronous SGD (SSGD) algorithm. On commonly used public SWB-300 and SWB-2000 ASR datasets, ADPSGD can converge with a batch size 3X as large as the one used in SSGD, thus enable training at a much larger scale. Further, we proposed a Hierarchical-ADPSGD (H-ADPSGD) system in which learners on the same computing node construct a super learner via a fast allreduce implementation, and super learners deploy ADPSGD algorithm among themselves. On a 64 Nvidia V100 GPU cluster connected via a 100Gb/s Ethernet network, our system is able to train SWB-2000 to reach a 7.6% WER on the Hub5-2000 Switchboard (SWB) test-set and a 13.2% WER on the Call-home (CH) test-set in 5.2 hours. To the best of our knowledge, this is the fastest ASR training system that attains this level of model accuracy for SWB-2000 task to be ever reported in the literature.

2019-07-01 — Having an Adult Child in the United States, Physical Functioning, and Unmet Needs for Care Among Older Mexican Adults.

Authors: Jacqueline M. Torres, Kara E. Rudolph, Oleg Sofrygin, R. Wong, Louise C. Walter, M. Glymour
Year: 2019
Publication Date: 2019-07-01
Venue: Epidemiology
DOI: 10.1097/EDE.0000000000001016
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
BACKGROUND Migration of adult children may impact the health of aging parents who remain in low- and middle-income countries. Prior studies have uncovered mixed associations between adult child migration status and physical functioning of older parents; none to our knowledge has examined the impact on unmet caregiving needs. METHODS Data come from a population-based study of Mexican adults ≥50 years. We used longitudinal targeted maximum likelihood estimation to estimate associations between having an adult child US migrant and lower-body functional limitations, and both needs and unmet needs for assistance with basic or instrumental activities of daily living (ADLs/IADLs) for 11,806 respondents surveyed over an 11-year period. RESULTS For women, having an adult child US migrant at baseline and 2-year follow-up was associated with fewer lower-body functional limitations [marginal risk difference (RD) = -0.14, 95% confidence interval (CI) = -0.26, -0.01] and ADLs/IADLs (RD = -0.08, 95% CI = -0.16, -0.001) at 2-year follow-up. Having an adult child US migrant at all waves was associated with a higher prevalence of functional limitations at 11-year follow-up (RD = 0.04, 95% CI = 0.01, 0.06). Having an adult child US migrant was associated with a higher prevalence of unmet needs for assistance at 2 (RD = 0.13, 95% CI = 0.04, 0.21) and 11-year follow-up for women (RD = 0.07, 95% CI = -0.02, 0.15) and 11-year follow-up for men (RD = 0.08, 95% CI = 0.00, 0.16). CONCLUSION Having an adult child US migrant had mixed associations with physical functioning, but substantial adverse associations with unmet caregiving needs for a cohort of older adults in Mexico.

2019-06-19 — Comparaison d'estimateurs de la variance du TMLE

Authors: L. Boulanger
Year: 2019
Publication Date: 2019-06-19
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract unavailable from Semantic Scholar.

2019-06-07 — The impact of delayed switch to second-line antiretroviral therapy on mortality, depending on failure time definition and CD4 count at failure

Authors: Helen Bell-Gorrod, M. Fox, A. Boulle, H. Prozesky, R. Wood, F. Tanser, M. Davies, M. Schomaker
Year: 2019
Publication Date: 2019-06-07
Venue: bioRxiv
DOI: 10.1101/661629
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Background Little is known about the functional relationship of delaying second-line treatment initiation for HIV-positive patients and mortality, given a patient’s immune status. Methods We included 7255 patients starting antiretroviral therapy between 2004-2017, from 9 South African cohorts, with virological failure and complete baseline data. We estimated the impact of switch time on the hazard of death using inverse probability of treatment weighting (IPTW) of marginal structural models. The non-linear relationship between month of switch and the 5-year survival probability, stratified by CD4 count at failure, was estimated with targeted maximum likelihood estimation (TMLE). We adjusted for measured time-varying confounding by CD4 count, viral load and visit frequency. Results 5-year mortality was estimated as 10.5% (2.2%; 18.8%) for immediate switch and as 26.6% (20.9%; 32.3%) for no switch (49.9% if CD4 count<100 cells/mm3). The hazard of death was estimated to be 0.40 (95%CI: 0.33-0.48) times lower if everyone had been switched immediately compared to never. The shorter the delay in switching, the lower the hazard of death, e.g. delaying 30-60 days reduced the hazard 0.52 (0.41-0.65) times, and 60-120 days 0.56 (0.47-0.66) times. Conclusions Early treatment switch is particularly important for patients with low CD4 counts at failure.

2019-06-03 — Can Hyperparameter Tuning Improve the Performance of a Super Learner?

Authors: Jenna Wong, Travis Manderson, M. Abrahamowicz, D. Buckeridge, R. Tamblyn
Year: 2019
Publication Date: 2019-06-03
Venue: Epidemiology
DOI: 10.1097/EDE.0000000000001027
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Supplemental Digital Content is available in the text.

2019-05-24 — More Efficient Off-Policy Evaluation through Regularized Targeted Learning

Authors: Aurélien F. Bibaut, I. Malenica, N. Vlassis, M. Laan
Year: 2019
Publication Date: 2019-05-24
Venue: International Conference on Machine Learning
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
We study the problem of off-policy evaluation (OPE) in Reinforcement Learning (RL), where the aim is to estimate the performance of a new policy given historical data that may have been generated by a different policy, or policies. In particular, we introduce a novel doubly-robust estimator for the OPE problem in RL, based on the Targeted Maximum Likelihood Estimation principle from the statistical causal inference literature. We also introduce several variance reduction techniques that lead to impressive performance gains in off-policy evaluation. We show empirically that our estimator uniformly wins over existing off-policy evaluation methods across multiple RL environments and various levels of model misspecification. Finally, we further the existing theoretical analysis of estimators for the RL off-policy estimation problem by showing their $O_P(1/\sqrt{n})$ rate of convergence and characterizing their asymptotic distribution.

2019-05-23 — Nonparametric Bootstrap Inference for the Targeted Highly Adaptive LASSO Estimator.

Authors: Weixin Cai, M. Laan
Year: 2019
Publication Date: 2019-05-23
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso, tmle

Abstract:
The Highly-Adaptive-LASSO Targeted Minimum Loss Estimator (HAL-TMLE) is an efficient plug-in estimator of a pathwise differentiable parameter in a statistical model that at minimal (and possibly only) assumes that the sectional variation norm of the true nuisance functional parameters (i.e., the relevant part of data distribution) are finite. It relies on an initial estimator (HAL-MLE) of the nuisance functional parameters by minimizing the empirical risk over the parameter space under the constraint that the sectional variation norm of the candidate functions are bounded by a constant, where this constant can be selected with cross-validation. In this article, we establish that the nonparametric bootstrap for the HAL-TMLE, fixing the value of the sectional variation norm at a value larger or equal than the cross-validation selector, provides a consistent method for estimating the normal limit distribution of the HAL-TMLE. In order to optimize the finite sample coverage of the nonparametric bootstrap confidence intervals, we propose a selection method for this sectional variation norm that is based on running the nonparametric bootstrap for all values of the sectional variation norm larger than the one selected by cross-validation, and subsequently determining a value at which the width of the resulting confidence intervals reaches a plateau. We demonstrate our method for 1) nonparametric estimation of the average treatment effect based on observing on each unit a covariate vector, binary treatment, and outcome, and for 2) nonparametric estimation of the integral of the square of the multivariate density of the data distribution. In addition, we also present simulation results for these two examples demonstrating the excellent finite sample coverage of bootstrap-based confidence intervals.

2019-04-03 — A New Machine-Learning Prediction Model for Slope Deformation of an Open-Pit Mine: An Evaluation of Field Data

Authors: Sunwen Du, Guo-rui Feng, Jianmin Wang, Shizhe Feng, R. Malekian, Zhixiong Li
Year: 2019
Publication Date: 2019-04-03
Venue: Energies
DOI: 10.3390/EN12071288
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Effective monitoring of the slope deformation of an open-pit mine is essential for preventing catastrophic collapses. It is a challenging task to accurately predict slope deformation. To this end, this article proposed a new machine-learning method for slope deformation prediction. Ground-based interferometric radar (GB-SAR) was employed to collect the slope deformation data from an open-pit mine. Then, an ensemble learner, which aggregated a set of weaker learners, was proposed to mine the GB-SAR field data, delivering a slope deformation prediction model. The evaluation of the field data acquired from the Anjialing open-pit mine demonstrates that the proposed slope deformation model was able to precisely predict the slope deformation of the monitored mine. The prediction accuracy of the super learner was superior to those of all the independent weaker learners.

2019-04-01 — Prediction of VRC01 neutralization sensitivity by HIV-1 gp160 sequence features

Authors: Craig A. Magaret, David C. Benkeser, B. Williamson, Bhavesh Borate, Lindsay N. Carpp, I. Georgiev, Ian Setliff, A. Dingens, N. Simon, M. Carone, Christopher Simpkins, D. Montefiori, G. Alter, Wen-Han Yu, M. Juraska, P. Edlefsen, S. Karuna, N. Mgodi, Srilatha Edugupanti, P. Gilbert
Year: 2019
Publication Date: 2019-04-01
Venue: PLoS Comput. Biol.
DOI: 10.1371/journal.pcbi.1006952
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
The broadly neutralizing antibody (bnAb) VRC01 is being evaluated for its efficacy to prevent HIV-1 infection in the Antibody Mediated Prevention (AMP) trials. A secondary objective of AMP utilizes sieve analysis to investigate how VRC01 prevention efficacy (PE) varies with HIV-1 envelope (Env) amino acid (AA) sequence features. An exhaustive analysis that tests how PE depends on every AA feature with sufficient variation would have low statistical power. To design an adequately powered primary sieve analysis for AMP, we modeled VRC01 neutralization as a function of Env AA sequence features of 611 HIV-1 gp160 pseudoviruses from the CATNAP database, with objectives: (1) to develop models that best predict the neutralization readouts; and (2) to rank AA features by their predictive importance with classification and regression methods. The dataset was split in half, and machine learning algorithms were applied to each half, each analyzed separately using cross-validation and hold-out validation. We selected Super Learner, a nonparametric ensemble-based cross-validated learning method, for advancement to the primary sieve analysis. This method predicted the dichotomous resistance outcome of whether the IC50 neutralization titer of VRC01 for a given Env pseudovirus is right-censored (indicating resistance) with an average validated AUC of 0.868 across the two hold-out datasets. Quantitative log IC50 was predicted with an average validated R2 of 0.355. Features predicting neutralization sensitivity or resistance included 26 surface-accessible residues in the VRC01 and CD4 binding footprints, the length of gp120, the length of Env, the number of cysteines in gp120, the number of cysteines in Env, and 4 potential N-linked glycosylation sites; the top features will be advanced to the primary sieve analysis. This modeling framework may also inform the study of VRC01 in the treatment of HIV-infected persons.

2019-04-01 — Prediction of Decompensation in Patients in the Cardiac Ward

Authors: Justin Niestroy, Jiangxue Han, Jingyi Luo, Runhao Zhao, D. Lake, A. Flower
Year: 2019
Publication Date: 2019-04-01
Venue: Systems and Information Engineering Design Symposium
DOI: 10.1109/SIEDS.2019.8735602
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
This study focuses on detecting deterioration of acutely ill patients in the cardiac ward at the University of Virginia Health System. Patients in the cardiac ward are expected to recover from a variety of cardiovascular procedures, but roughly 5% of patients deteriorate and have to be transferred to the Intensive Care Unit (ICU). Previous work has shown that early warning scores utilizing vitals signs and common lab results greatly lower morality for high risk patients. To build upon these results, data were collected over the course of two years from 71 beds in three cardiac-related wards at the University of Virginia Health System. In addition to information commonly collected for early warning scores, these data also contained continuous electrocardiography (ECG) telemetry data for all patients. Given that only one percent of observations are labeled as events, the F1 score was used as the primary metric to assess the performance of each model; area under the curve (AUC) was also considered. Previous work includes the development of logistic regression models with these data resulting in an AUC of 0.73. In this work, a super learner was built to further the study by stacking logistic regression, random forest, and gradient boosting models. Furthermore, a denoising auto-encoder was created to generate computer-derived features, the results of which were fed to machine learning models mentioned previously to predict patient deterioration. The logistic regression model built on existing and computer-generated features resulted in an F1 score of 0.1 and AUC of 0.7, which is comparable to previous models built on the same patient data set. The super learner had an improvement over existing logistic regression models, with an F1 score of 0.24 and AUC of 0.79.

2019-03-27 — Comparison of Parametric and Nonparametric Estimators for the Association Between Incident Prepregnancy Obesity and Stillbirth in a Population-Based Cohort Study.

Authors: Ya-Hui Yu, L. Bodnar, M. Brooks, K. Himes, A. Naimi
Year: 2019
Publication Date: 2019-03-27
Venue: American Journal of Epidemiology
DOI: 10.1093/AJE/KWZ081
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation, tmle

Abstract:
While prepregnancy obesity increases risk of stillbirth, few studies have evaluated the role of newly developed obesity independent of long-standing obesity. Additionally, researchers have relied almost exclusively on parametric models, which require correct specification of an unknown function for consistent estimation. We estimated the association between incident obesity and stillbirth in a cohort constructed from linked birth and death records in Pennsylvania (2003-2013). Incident obesity was defined as body mass index (weight (kg)/height (m)2) greater than or equal to 30. We used parametric G-computation, semiparametric inverse-probability weighting, and parametric/nonparametric targeted minimum loss-based estimation (TMLE) to estimate the association between incident prepregnancy obesity and stillbirth. Compared with pregnancies from women who stayed nonobese, women who became obese prior to their next pregnancy were estimated to have 2.0 (95% confidence interval (CI): 0.5, 3.5) more stillbirths per 1,000 pregnancies using parametric G-computation. However, despite well-behaved stabilized inverse probability weights, risk differences estimated from inverse-probability weighting, nonparametric TMLE, and parametric TMLE represented 6.9 (95% CI: 3.7, 10.0), 0.4 (95% CI: 0.1, 0.7), and 2.9 (95% CI: 1.5, 4.2) excess stillbirths per 1,000 pregnancies, respectively. These results, particularly those derived from nonparametric TMLE, were highly sensitive to covariates included in the propensity score models. Our results suggest that caution is warranted when using nonparametric estimators to quantify exposure effects.

2019-03-19 — Estimating a Dynamic Effect of Soda Intake on Pediatric Dental Caries Using Targeted Maximum Likelihood Estimation Method

Authors: Sungwoo Lim, M. Tellez, A. Ismail
Year: 2019
Publication Date: 2019-03-19
Venue: Caries Research
DOI: 10.1159/000497359
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
An effect of soda intake on dental caries in young children (birth to 5 years) may vary over time. Estimating a dynamic effect may be challenging due to time-varying confounding and loss to follow-up. The purpose of this paper is to demonstrate utility of targeted maximum likelihood estimation (TMLE) method in addressing longitudinal data analysis challenges and estimating a dynamic effect of soda intake on pediatric caries. Data came from the Detroit Dental Health Project, a 4-year cohort study of low-income African-American children and caregivers. The sample included 995 child–caregiver pairs who participated in 2002–03 (W1) and were followed up in 2004–05 (W2) and 2007 (W3). The outcome was counts of caries surfaces at W3, and the exposure was child’s soda intake at W1 and W2. Time-varying covariates included caregiver’s smoking status, oral health fatalism, and social support. Forty-three percent of children consistently consumed soda at W1 and W2, whereas 21% were nonconsumers throughout 2 surveys. The remaining 35% switched intake status between W1 and W2. Association between soda intake patterns and caries was tested using TMLE. Children with a consistent soda intake had 1.03 more caries lesions at W3 than those with consistently no soda intake (95% CI 0.09–1.97) on average. If soda was consumed only at W1 or W2, an estimated effect of soda on caries development at W3 was no longer statistically significant. In conclusion, consistent soda intake during the early childhood led to one additional caries tooth surface. The study highlights utility of TMLE in pediatric caries research as it can handle modeling challenges associated with longitudinal data.

2019-03-01 — Machine learning in policy evaluation: new tools for causal inference

Authors: N. Kreif, Karla DiazOrdaz
Year: 2019
Publication Date: 2019-03-01
Venue: Oxford Research Encyclopedia of Economics and Finance
DOI: 10.1093/ACREFORE/9780190625979.013.256
Link: Semantic Scholar
Matched Keywords: super learner, targeted maximum likelihood estimation

Abstract:
While machine learning (ML) methods have received a lot of attention in recent years, these methods are primarily for prediction. Empirical researchers conducting policy evaluations are, on the other hand, preoccupied with causal problems, trying to answer counterfactual questions: what would have happened in the absence of a policy? Because these counterfactuals can never be directly observed (described as the “fundamental problem of causal inference”) prediction tools from the ML literature cannot be readily used for causal inference. In the last decade, major innovations have taken place incorporating supervised ML tools into estimators for causal parameters such as the average treatment effect (ATE). This holds the promise of attenuating model misspecification issues, and increasing of transparency in model selection. One particularly mature strand of the literature include approaches that incorporate supervised ML approaches in the estimation of the ATE of a binary treatment, under the unconfoundedness and positivity assumptions (also known as exchangeability and overlap assumptions). This article begins by reviewing popular supervised machine learning algorithms, including trees-based methods and the lasso, as well as ensembles, with a focus on the Super Learner. Then, some specific uses of machine learning for treatment effect estimation are introduced and illustrated, namely (1) to create balance among treated and control groups, (2) to estimate so-called nuisance models (e.g., the propensity score, or conditional expectations of the outcome) in semi-parametric estimators that target causal parameters (e.g., targeted maximum likelihood estimation or the double ML estimator), and (3) the use of machine learning for variable selection in situations with a high number of covariates. Since there is no universal best estimator, whether parametric or data-adaptive, it is best practice to incorporate a semi-automated approach than can select the models best supported by the observed data, thus attenuating the reliance on subjective choices.

2019-02-01 — Metalworking Fluids and Colon Cancer Risk

Authors: Monika A. Izano, Oleg Sofrygin, Sally Picciotto, P. Bradshaw, E. Eisen
Year: 2019
Publication Date: 2019-02-01
Venue: Environmental Epidemiology
DOI: 10.1097/EE9.0000000000000035
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation, tmle

Abstract:
Background: Metalworking fluids (MWFs) are a class of complex mixtures of chemicals and oils, including several known carcinogens that may pose a cancer hazard to millions of workers. Reports on the relation between MWFs and incident colon cancer have been mixed. Methods: We investigated the relation between exposure to straight, soluble, and synthetic MWFs and the incidence of colon cancer in a cohort of automobile manufacturing industry workers, adjusting for time-varying confounding affected by prior exposure to reduce healthy worker survivor bias. We used longitudinal targeted minimum loss-based estimation (TMLE) to estimate the difference in the cumulative incidence of colon cancer comparing counterfactual outcomes if always exposed above to always exposed below an exposure cutoff while at work. Exposure concentration cutoffs were selected a priori at the 90th percentile of total particulate matter for each fluid type: 0.034, 0.400, and 0.003 for straight, soluble, and synthetic MWFs, respectively. Results: The estimated 25-year risk differences were 3.8% (95% confidence interval [CI] = 0.7, 7.0) for straight, 1.3% (95% CI = −2.3, 4.8) for soluble, and 0.2% (95% CI = −3.3, 3.7) for synthetic MWFs, respectively. The corresponding risk ratios were 2.39 (1.12, 5.08), 1.43 (0.67, 3.04), and 1.08 (0.51, 2.30) for straight, soluble, and synthetic MWFs, respectively. Conclusions: By controlling for time-varying confounding affected by prior exposure, a key feature of occupational cohorts, we were able to provide evidence for a causal effect of straight MWF exposure on colon cancer risk that was not found using standard analytical techniques in previous reports.

2019-02-01 — Comment: Contributions of Model Features to BART Causal Inference Performance Using ACIC 2016 Competition Data

Authors: N. Carnegie
Year: 2019
Publication Date: 2019-02-01
Venue: Statistical Science
DOI: 10.1214/18-STS682
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
With a thorough exposition of the methods and results of the 2016 Atlantic Causal Inference Competition, Dorie et al. have set a new standard for reproducibility and comparability of evaluations of causal inference methods. In particular, the open-source R package aciccomp2016, which permits reproduction of all datasets used in the competition, will be an invaluable resource for evaluation of future methodological developments. Building upon results from Dorie et al., we examine whether a set of potential modifications to Bayesian Additive Regression Trees (BART)—multiple chains in model fitting, using the propensity score as a covariate, targeted maximum likelihood estimation (TMLE), and computing symmetric confidence intervals—have a stronger impact on bias, RMSE, and confidence interval coverage in combination than they do alone. We find that bias in the estimate of SATT is minimal, regardless of the BART formulation. For purposes of CI coverage, however, all proposed modifications are beneficial— alone and in combination—but use of TMLE is least beneficial for coverage and results in considerably wider confidence intervals.

2019-01-17 — Enhancing the Performance of Classification Using Super Learning

Authors: Md Faisal Kabir, Simone A. Ludwig
Year: 2019
Publication Date: 2019-01-17
Venue: Data-Enabled Discovery and Applications
DOI: 10.1007/S41688-019-0030-0
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Abstract unavailable from Semantic Scholar.

2019-01-14 — An educational intervention to improve knowledge about prevention against occupational asthma and allergies using targeted maximum likelihood estimation

Authors: Daloha Rodríguez-Molina, Swaantje Barth, R. Herrera, C. Rossmann, K. Radon, V. Karnowski
Year: 2019
Publication Date: 2019-01-14
Venue: International Archives of Occupational and Environmental Health
DOI: 10.1007/s00420-018-1397-1
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
PurposeOccupational asthma and allergies are potentially preventable diseases affecting 5–15% of the working population. However, the use of preventive measures is often insufficient. The aim of this study was to estimate the average treatment effect of an educational intervention designed to improve the knowledge of preventive measures against asthma and allergies in farm apprentices from Bavaria (Southern Germany).MethodsFarm apprentices at Bavarian farm schools were asked to complete a questionnaire evaluating their knowledge about preventive measures against occupational asthma and allergies (use of personal protective equipment, personal and workplace hygiene measures). Eligible apprentices were randomized by school site to either a control or an intervention group. The intervention consisted of a short educational video about use of preventive measures. Six months after the intervention, subjects were asked to complete a post-intervention questionnaire. Of the 116 apprentices (70 intervention group, 46 control group) who answered the baseline questionnaire, only 47 subjects (41%; 17 intervention group, 30 control group) also completed the follow-up questionnaire. We, therefore, estimated the causal effect of the intervention using targeted maximum likelihood estimation. Models were controlled for potential confounders.ResultsBased on the targeted maximum likelihood estimation, the intervention would have increased the proportion of correct answers on all six preventive measures by 18.4% (95% confidence interval 7.3–29.6%) had all participants received the intervention vs. had they all been in the control group.ConclusionsThese findings indicate the improvement of knowledge by the educational intervention.

2019 — Υπολογιστική ανάλυση των λειτουργιών των μη κωδικών μεταγραφών στη γονιδιωματική ρύθμιση

Authors: Dimitra Karagkouni, Δήμητρα Καραγκούνη
Year: 2019
DOI: 10.12681/eadd/45135
Link: Semantic Scholar
Matched Keywords: super learner, super learning

Abstract:
Οι ραγδαίες τεχνολογικές εξελίξεις την τελευταία δεκαετία επέτρεψαν αναλύσεις μεγάλης κλίμακας στο πεδίο του «ρυθμιστικού RNA», μετατρέποντας τα μη-κωδικά μετάγραφα, που αρχικά θεωρούνταν «σκουπίδια», σε ερευνητικό «χρυσωρυχείο». Τα μη-κωδικά μετάγραφα διαδραματίζουν καθοριστικό ρόλο σε ένα αξιοσημείωτο αριθμό από φυσιολογικές και παθολογικές βιολογικές διεργασίες. Η τεράστια παραγωγή δεδομένων ήταν επίσης ένας από τους σημαντικότερους παράγοντες της επιταχυνόμενης εξέλιξης του τομέα της βιοπληροφορικής, ενός τομέα εξειδικευμένου στην ανάλυση βιολογικών δεδομένων και την ανάπτυξη υπολογιστικών εργαλείων, απαραίτητων για την επεξεργασία και την ερμηνεία των αποτελεσμάτων τους. Αυτή η εργασία επικεντρώνεται στο λεπτομερή και ακριβή συνδυασμό υψηλής διεκπεραιωτικής ικανότητας δεδομένων και σύγχρονων τεχνικών μηχανικής μάθησης για την ανάπτυξη αλγορίθμων με στόχο το λειτουργικό χαρακτηρισμό των μη-κωδικών μεταγραφών.Η παρούσα διατριβή επικεντρώνεται σε μια συγκεκριμένη κατηγορία μεταγραφών, τα microRNAs. Τα microRNAs (miRNAs) είναι μικρά, μονόκλωνα, μη-κωδικά μόρια RNA, μήκους ~ 22 νουκλεοτιδίων, που προσδένονται στην πρωτεΐνη Αργοναύτη (AGO) για να προκαλέσουν τη διάσπαση του μεταγράφου-στόχου, την αποικοδόμηση ή την καταστολή της μετάφρασής του. Ο ακριβής χαρακτηρισμός των στόχων τους θεωρείται θεμελιώδης για την αποσαφήνιση του ρυθμιστικού τους ρόλου. Τα τελευταία 15 χρόνια, έχει αναπτυχθεί μία πληθώρα υπολογιστικών και πειραματικών προσεγγίσεων με στόχο τον προσδιορισμό των αλληλεπιδράσεων των μικρών RNAs. Επί του παρόντος, οι τεχνικές υψηλής απόδοσης επέτρεψαν την εύρεση νέων πειραματικά υποστηριζόμενων αλληλεπιδράσεων των miRNAs σε όλο το μεταγράφωμα. Αυτός ο πλούτος των πληροφοριών είναι διασκορπισμένος σε μεγάλο αριθμό δημοσιεύσεων και ακατέργαστων δεδομένων. Κατά τη διάρκεια αυτής της διατριβής, σχεδιάστηκε το DIANA-TarBase v8.0, μια βάση δεδομένων αναφοράς, αφιερωμένη στην ευρετηρίαση πειραματικά υποστηριζόμενων στόχων των miRNAs. Η 8η έκδοση είναι η πρώτη βάση δεδομένων που αναφέρει περισσότερες από 1 εκατομμύριο καταχωρήσεις, που αντιστοιχούν σε ~700.000 μοναδικές miRNA-gene αλληλεπιδράσεις, υποστηριζόμενες από περισσότερες από 33 πειραματικές μεθοδολογίες, που έχουν εφαρμοστεί σε 592 κυτταρικούς τύπους/ιστούς, υπό~ 430 πειραματικές συνθήκες.Τα πειράματα με ανοσοκατακρήμνηση της πρωτεΐνης AGO (AGO-CLIP-Seq) αποτελούν τις πιο διαδεδομένες μεθοδολογίες υψηλής απόδοσης. Η AGO-PAR-CLIP τεχνική έχει πραγματοποιηθεί ευρέως για τη χαρτογράφηση miRNA-gene αλληλεπιδράσεων σε μεγάλη κλίμακα σε υγιείς ή ασθενείς τύπους κυττάρων. Οι υπολογιστικές μέθοδοι που έχουν αναπτυχθεί με στόχο την ανάλυση αυτών των δεδομένων παρουσιάζουν μειωμένη ικανότητα να διακρίνουν ένα μεγάλο μέρος των πραγματικών miRNA-στόχων. Για το σκοπό αυτό, ένας από τους σκοπούς της παρούσας διατριβής είναι να επανεξετάσει, να εντοπίσει και να αντιμετωπίσει τα τρέχοντα εμπόδια στην ανάλυση AGO-CLIP-Seq δεδομένων. Παρουσιάζεται, λοιπόν, το μοντέλο microCLIP, μία υπολογιστική προσέγγιση για την κατευθυνόμενη από CLIP-Seq δεδομένα αναγνώριση των αλληλεπιδράσεων των miRNAs. Το microCLIP είναι ένα καινοτόμο ensemble μοντέλο βαθειάς εκμάθησης (super learner) και η μόνη διαθέσιμη υπολογιστική προσέγγιση που αναλύει AGO-PAR-CLIP δεδομένα από το Α έως το Ω. Επεξεργάζεται όλες τις εμπλουτισμένες σε AGO περιοχές, παρέχοντας λειτουργικές περιοχές πρόσδεσης των miRNAs με ισχυρή προσβασιμότητα, που μέχρι πρότινος αγνοούνταν.Η ανάπτυξη του microCLIP ενέπνευσε τη δημιουργία ενός αλγόριθμου επόμενης γενιάς, για την εύρεση των στόχων των miRNAs απουσία πειράματος. Παρά την εκτενή ανάπτυξη σχετικών προσεγγίσεων που παρατηρείται τα τελευταία χρόνια, ακόμη και οι αλγόριθμοι αιχμής εξακολουθούν να επιτυγχάνουν χαμηλή ακρίβεια και αυξημένο αριθμό ψευδώς θετικών προβλέψεων. Για αυτόν το λόγο, αναπτύχθηκε το μοντέλο microT Super Learning που διατηρεί και αναβαθμίζει τη μεθοδολογία του microCLIP αλγορίθμου, ενισχύοντας την εκπαίδευσή του με ακόμη περισσότερα πειράματα υψηλής απόδοσης υπό έναν ιστο-ειδικό σχεδιασμό. Το νέο μοντέλο χαρακτηρίζει αλληλεπιδράσεις με ισχυρότερη λειτουργικότητα και ανιχνεύει σωστά 1.5 φορές περισσότερες πειραματικά επιβεβαιωμένες περιοχές πρόσδεσης των μικρών RNAs, όταν αντιπαρατίθεται με κορυφαίες υπολογιστικές προσεγγίσεις. Η αυξημένη απόδοση των αλγορίθμων microCLIP και microT στην ανίχνευση των αλληλεπιδράσεων των miRNAs, αναδεικνύει ρυθμιστικά συμβάντα που μέχρι πρότινος αγνοούνταν και νέα μοριακά μονοπάτια που ελέγχονται από τα miRNAs.Κατά τη διάρκεια της παρούσας εργασίας, η υποψήφια διδάκτωρ συμμετείχε σε 9 επιστημονικές δημοσιεύσεις που αφορούσαν υπολογιστικές προσεγγίσεις για τον προσδιορισμό της λειτουργίας των μη κωδικών μεταγραφών και σε δύο από αυτές είναι η πρώτη συγγραφέας. Η κύρια ερευνητική δραστηριότητα και η συμβολή της υποψήφιας στις δημοσιεύσεις αυτές αφορά την εφαρμογή αλγορίθμων, αυτοματοποιημένων ροών ανάλυσης για την επεξεργασία πειραματικών δεδομένων επόμενης γενιάς και τον κατάλληλο συνδυασμό τους με στόχο την αποσαφήνιση της λειτουργίας των μη-κωδικών RNAs και της συμμετοχής τους σε μηχανισμούς μετα-μεταγραφικής γονιδιακής ρύθμισης. Οι μελέτες έχουν δημοσιευθεί σε διεθνή περιοδικά υψηλής απήχησης και οι συνολικές ετεροαναφορές μέχρι σήμερα, σύμφωνα με το Google Scholar, είναι 942.

2019 — The Effectivenees of Management and Administration Sekolah Menengah Imtiaz Yayasan Terengganu (Smiyt) to Student Academic Excellence

Authors: Mohd Rasmawi Bin Harun, S. Yamin, Y. M. D. B. Mansor
Year: 2019
Venue: Proceedings of the Proceedings of the 1st International Conference on Finance Economics and Business, ICOFEB 2018, 12-13 November 2018, Lhokseumawe, Aceh, Indonesia
DOI: 10.4108/eai.12-11-2018.2288782
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Abstrak: In this era the development of a country is largely depend on the progress of thinking and the ability of human capital to be generated by its educational system. Malaysia wants to drive a knowledge-based economy, will definitely require first-class human capital. The Ulul-Albab curriculum is a big contributor to the paradigm shift in the Malaysians educational system. The Sekolah Menengah IMTIAZ Yayasan Terengganu is using the Ulul-Albab curriculum. It operates in eight districts which include Besut, Dungun, Kuala Berang, Kuala Terengganu, Kemaman, Kuala Nerus, Setiu dan Marang. The Ulul-Albab curriculum is an integrated educational curriculum which comprise pure science programs with religious programs including the Tahfiz al-Quran. The Ulul-Albab curriculum aims to produce professional and entrepreneurs who are not only knowledgeable but have proficiency in the field of religion based on the Quran and alSunnah as Ulul-Albab generation. The main objective of the Ulul-Albab Curriculum is to produce the Ulul-Albab generation which features three components namely Quranic, Encyclopedic and Ijtihadik. The Ulul-Albab learning method is Super Learning, compacting syllabus with wide Enrichment Model Methods and personality development to the students. The program include parental involvement, Community programs and supplementary diet with sunnah food. In an era of an increasingly advanced and challenging education world, the government aims to produce 125,000 professional memorizes in various fields by 2050. In high school Imtiaz Besut it was found that the Quranic memorization has a positive impact on student academic excellence based on comparative analysis of SPM achievement from 2012 to 2017, The implication of this study concluded that the memorizing activities of the Quran should be addressed and blended in academic field as the Quranic memorization also contributes to student academic excellence.

2019 — Targeted Maximum Likelihood Estimation and Ensemble Learning for Community-Level Data and Healthcare Claims Data

Authors: Chi Zhang
Year: 2019
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2019 — Predicting the Early Stages of the Alzheimer's Disease via Combined Brain Multi-projections and Small Datasets

Authors: K. T. Duarte, P. V. V. Paiva, Paulo S. Martins, M. A. Carvalho
Year: 2019
Venue: VISIGRAPP
DOI: 10.5220/0007404705530560
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Alzheimer is a neurodegenerative disease that usually affects the elderly. It compromises a patient’s memory, his/her cognition, and perception of the environment. Alzheimer’s Disease detection in its initial stage, known as Mild Cognitive Impairment, attracts special efforts from experts due to the possibility of using drugs to delay the progression of the disease. This paper aims to provide a method for the detection of this impairment condition via the classification of brain images using Transfer Learning Deep Features and Support Vector Machine. The small number of images used in this work justifies the application of Transfer Learning, which employs weights from VGG19 initial layers used for ImageNet classification as deep features extractor, and then applies Support Vector Machines. Majority Voting, False-Positive Priori, and Super Learner were applied to combine previous classifiers predictions. The final step was a detection to assign a label to the previous voting outcomes, determining the presence or absence of an Alzheimers pre-condition. The OASIS-1 database was used with a total of 196 images (axial, coronal, and sagittal). Our method showed a promising performance in terms of accuracy, recall and specificity.

2018 (56 papers)
2018-12-31 — Reassessing the Effectiveness of Right Heart Catheterization (RHC) in the Initial Care of Critically Ill Patients using Targeted Maximum Likelihood Estimation

Authors: Zhu Hai, A. Mary, Zhang Shuqin, P. JohnsonNils, Lai Dejian, Zhu Hongjian
Year: 2018
Publication Date: 2018-12-31
Venue: International Journal of Clinical Biostatistics and Biometrics
DOI: 10.23937/2469-5831/1510018
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Citation: Akosile M, Zhu H, Zhang S, Johnson NP, Lai D, et al. (2018) Reassessing the Effectiveness of Right Heart Catheterization (RHC) in the Initial Care of Critically Ill Patients using Targeted Maximum Likelihood Estimation. Int J Clin Biostat Biom 4:018. doi.org/10.23937/2469-5831/1510018 Accepted: July 26, 2018: Published: July 28, 2018 Copyright: © 2018 Akosile M, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

2018-12-26 — Should a propensity score model be super? The utility of ensemble procedures for causal adjustment

Authors: Shomoita Alam, E. Moodie, D. Stephens
Year: 2018
Publication Date: 2018-12-26
Venue: Statistics in Medicine
DOI: 10.1002/sim.8075
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
In investigations of the effect of treatment on outcome, the propensity score is a tool to eliminate imbalance in the distribution of confounding variables between treatment groups. Recent work has suggested that Super Learner, an ensemble method, outperforms logistic regression in nonlinear settings; however, experience with real‐data analyses tends to show overfitting of the propensity score model using this approach. We investigated a wide range of simulated settings of varying complexities including simulations based on real data to compare the performances of logistic regression, generalized boosted models, and Super Learner in providing balance and for estimating the average treatment effect via propensity score regression, propensity score matching, and inverse probability of treatment weighting. We found that Super Learner and logistic regression are comparable in terms of covariate balance, bias, and mean squared error (MSE); however, Super Learner is computationally very expensive thus leaving no clear advantage to the more complex approach. Propensity scores estimated by generalized boosted models were inferior to the other two estimation approaches. We also found that propensity score regression adjustment was superior to either matching or inverse weighting when the form of the dependence on the treatment on the outcome is correctly specified.

2018-12-12 — Increased Alzheimer's risk during the menopause transition: A 3-year longitudinal brain imaging study

Authors: Lisa Mosconi, Aneela Rahman, I. Díaz, Xian Wu, Olivia Scheyer, H. Hristov, S. Vallabhajosula, R. Isaacson, M. D. de Leon, R. Brinton
Year: 2018
Publication Date: 2018-12-12
Venue: PLoS ONE
DOI: 10.1371/journal.pone.0207885
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Two thirds of all persons with late-onset Alzheimer’s disease (AD) are women. Identification of sex-based molecular mechanisms underpinning the female-based prevalence of AD would advance development of therapeutic targets during the prodromal AD phase when prevention or delay in progression is most likely to be effective. This 3-year brain imaging study examines the impact of the menopausal transition on Alzheimer’s disease (AD) biomarker changes [brain β-amyloid load via 11C-PiB PET, and neurodegeneration via 18F-FDG PET and structural MRI] and cognitive performance in midlife. Fifty-nine 40–60 year-old cognitively normal participants with clinical, neuropsychological, and brain imaging exams at least 2 years apart were examined. These included 41 women [15 premenopausal controls (PRE), 14 perimenopausal (PERI), and 12 postmenopausal women (MENO)] and 18 men. We used targeted minimum loss-based estimation to evaluate AD biomarker and cognitive changes. Older age was associated with baseline Aβ and neurodegeneration markers, but not with rates of change in these biomarkers. APOE4 status influenced change in Aβ load, but not neurodegenerative changes. Longitudinally, MENO and PERI groups showed declines in estrogen-dependent memory tests as compared to men (p < .04). Adjusting for age, APOE4 status, and vascular risk confounds, the MENO and PERI groups exhibited higher rates of CMRglc decline as compared to males (p ≤ .015). The MENO group exhibited the highest rate of hippocampal volume loss (p’s ≤ .001), and higher rates of Aβ deposition than males (p < .01). CMRglc decline exceeded Aβ and atrophy changes in all female groups vs. men. These findings indicate emergence and progression of a female-specific hypometabolic AD-endophenotype during the menopausal transition. These findings suggest that the optimal window of opportunity for therapeutic intervention to prevent or delay progression of AD endophenotype in women is early in the endocrine aging process.

2018-12-01 — Smoking Is Associated with Higher Disease Activity in Rheumatoid Arthritis: A Longitudinal Study Controlling for Time-varying Covariates

Authors: M. Gianfrancesco, L. Trupin, S. Shiboski, M. J. van der Laan, J. Graf, J. Imboden, J. Yazdany, G. Schmajuk
Year: 2018
Publication Date: 2018-12-01
Venue: Journal of Rheumatology
DOI: 10.3899/jrheum.180262
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Objective. Prior studies around the relationship between smoking and rheumatoid arthritis (RA) disease activity have reported inconsistent findings, which may be ascribed to heterogeneous study designs or biases in statistical analyses. We examined the association between smoking and RA outcomes using statistical methods that account for time-varying confounding and loss to followup. Methods. We included 282 individuals with an RA diagnosis using electronic health record data collected at a public hospital between 2013 and 2017. Current smoking status and disease activity were assessed at each visit; covariates included sex, race/ethnicity, age, obesity, and medication use. We used longitudinal targeted maximum likelihood estimation to estimate the causal effect of smoking on disease activity measures at 27 months, and compared results to conventional longitudinal methods. Results. Smoking was associated with an increase of 0.64 units in the patient global score compared to nonsmoking (p = 0.01), and with 2.58 more swollen joints (p < 0.001). While smoking was associated with a higher clinical disease activity score (2.11), the difference was not statistically significant (p = 0.22). We found no association between smoking and physician global score, or C-reactive protein levels, and an inverse association between smoking and tender joint count (p = 0.05). Analyses using conventional methods showed a null relationship for all outcomes. Conclusion. Smoking is associated with higher levels of disease activity in RA. Causal methods may be useful for investigations of additional exposures on longitudinal outcome measures in rheumatologic disease.

2018-12-01 — Dynamic prognostic model for kidney renal clear cell carcinoma (KIRC) patients by combining clinical and genetic information

Authors: Huiling Zhao, Yuting Cao, Yue Wang, Liya Zhang, Chen Chen, Yaoyan Wang, Xiaofan Lu, Shengjie Liu, Fangrong Yan
Year: 2018
Publication Date: 2018-12-01
Venue: Scientific Reports
DOI: 10.1038/s41598-018-35981-5
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
We aim to construct more accurate prognostic model for KIRC patients by combining the clinical and genetic information and monitor the disease progression in dynamically updated manner. By obtaining cross-validated prognostic indices from clinical and genetic model, we combine the two sources information into the Super learner model, and then introduce the time-varying effect into the combined model using the landmark method for real-time dynamic prediction. The Super learner model has better prognostic performance since it can not only employ the preferable clinical prognostic model constructed by oneself or reported in the current literature, but also incorporate genome level information to strengthen effectiveness. Apart from this, four representative patients’ mortality curves are drawn in the dynamically updated manner based on the Super learner model. It is found that effectively reducing the two prognostic indices value through suitable treatments might achieve the purpose of controlling the mortality of patients. Combining clinical and genetic information in the Super learner model would enhance the prognostic performance and yield more accurate results for dynamic predictions. Doctors could give patients more personalized treatment with dynamically updated monitoring of disease status, as well as some candidate prognostic factors for future research.

2018-11-25 — Machine learning methods for leveraging baseline covariate information to improve the efficiency of clinical trials

Authors: Zhiwei Zhang, Shujie Ma
Year: 2018
Publication Date: 2018-11-25
Venue: Statistics in Medicine
DOI: 10.1002/sim.8054
Link: Semantic Scholar
Matched Keywords: super learner, super learning

Abstract:
Clinical trials are widely considered the gold standard for treatment evaluation, and they can be highly expensive in terms of time and money. The efficiency of clinical trials can be improved by incorporating information from baseline covariates that are related to clinical outcomes. This can be done by modifying an unadjusted treatment effect estimator with an augmentation term that involves a function of covariates. The optimal augmentation is well characterized in theory but must be estimated in practice. In this article, we investigate the use of machine learning methods to estimate the optimal augmentation. We consider and compare an indirect approach based on an estimated regression function and a direct approach that aims directly to minimize the asymptotic variance of the treatment effect estimator. Theoretical considerations and simulation results indicate that the direct approach is generally preferable over the indirect approach. The direct approach can be implemented using any existing prediction algorithm that can minimize a weighted sum of squared prediction errors. Many such prediction algorithms are available, and the super learning principle can be used to combine multiple algorithms into a super learner under the direct approach. The resulting direct super learner has a desirable oracle property, is easy to implement, and performs well in realistic settings. The proposed methodology is illustrated with real data from a stroke trial.

2018-11-15 — Kernel Smoothing of the Treatment Effect CDF

Authors: Jonathan Levy, M. Laan
Year: 2018
Publication Date: 2018-11-15
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
The strata-specific treatment effect or so-called blip for a randomly drawn strata of confounders defines a random variable and a corresponding cumulative distribution function. However, the CDF is not pathwise differentiable, necessitating a kernel smoothing approach to estimate it at a given point or perhaps many points. Assuming the CDF is continuous, we derive the efficient influence curve of the kernel smoothed version of the blip CDF and a CV-TMLE estimator. The estimator is asymptotically efficient under two conditions, one of which involves a second order remainder term which, in this case, shows us that knowledge of the treatment mechanism does not guarantee a consistent estimate. The remainder term also teaches us exactly how well we need to estimate the nuisance parameters (outcome model and treatment mechanism) to guarantee asymptotic efficiency. Through simulations we verify theoretical properties of the estimator and show the importance of machine learning over conventional regression approaches to fitting the nuisance parameters. We also derive the bias and variance of the estimator, the orders of which are analogous to a kernel density estimator. This estimator opens up the possibility of developing methodology for optimal choice of the kernel and bandwidth to form confidence bounds for the CDF itself.

2018-11-12 — An Easy Implementation of CV-TMLE

Authors: Jonathan Levy
Year: 2018
Publication Date: 2018-11-12
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
In the world of targeted learning, cross-validated targeted maximum likelihood estimators, CV-TMLE [Zheng:2010aa], has a distinct advantage over TMLE [Laan:2006aa] in that one less condition is required of CV-TMLE in order to achieve asymptotic efficiency in the nonparametric or semiparametric settings. CV-TMLE as originally formulated, consists of averaging usually 10 (for 10-fold cross-validation) parameter estimates, each of which is performed on a validation set separate from where the initial fit was trained. The targeting step is usually performed as a pooled regression over all validation folds but in each fold, we separately evaluate any means as well as the parameter estimate. One nice thing about CV-TMLE, is that we average 10 plug-in estimates so the plug-in quality of preserving the natural parameter bounds is respected. Our adjustment of this procedure also preserves the plug-in characteristic as well as avoids the donsker condtion. The advantage of our procedure is the implementation of the targeting is identical to that of a regular TMLE, once all the validation set initial predictions have been formed. In short, we stack the validation set predictions and pretend as if we have a regular TMLE, which is not necessarily quite a plug-in estimator on each fold but overall will perform asymptotically the same and might have some slight advantage, a subject for future research. In the case of average treatment effect, treatment specific mean and mean outcome under a stochastic intervention, the procedure overlaps exactly with the originally formulated CV-TMLE with a pooled regression for the targeting.

2018-11-09 — A fundamental measure of treatment effect heterogeneity

Authors: Jonathan Levy, M. J. van der Laan, A. Hubbard, R. Pirracchio
Year: 2018
Publication Date: 2018-11-09
Venue: Journal of Causal Inference
DOI: 10.1515/jci-2019-0003
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract The stratum-specific treatment effect function is a random variable giving the average treatment effect (ATE) for a randomly drawn stratum of potential confounders a clinician may use to assign treatment. In addition to the ATE, the variance of the stratum-specific treatment effect function is fundamental in determining the heterogeneity of treatment effect values. We offer a non-parametric plug-in estimator, the targeted maximum likelihood estimator (TMLE) and the cross-validated TMLE (CV-TMLE), to simultaneously estimate both the average and variance of the stratum-specific treatment effect function. The CV-TMLE is preferable because it guarantees asymptotic efficiency under two conditions without needing entropy conditions on the initial fits of the outcome model and treatment mechanism, as required by TMLE. Particularly, in circumstances where data adaptive fitting methods are very important to eliminate bias but hold no guarantee of satisfying the entropy condition, we show that the CV-TMLE sampling distributions maintain normality with a lower mean squared error than TMLE. In addition to verifying the theoretical properties of TMLE and CV-TMLE through simulations, we highlight some of the challenges in estimating the variance of the treatment effect, which lack double robustness and might be biased if the true variance is small and sample size insufficient.

2018-11-03 — Canonical Least Favorable Submodels:A New TMLE Procedure for Multidimensional Parameters

Authors: Jonathan Levy
Year: 2018
Publication Date: 2018-11-03
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
This paper is a fundamental addition to the world of targeted maximum likelihood estimation (TMLE) (or likewise, targeted minimum loss estimation) for simultaneous estimation of multi-dimensional parameters of interest. TMLE, as part of the targeted learning framework, offers a crucial step in constructing efficient plug-in estimators for nonparametric or semiparametric models. The so-called targeting step of targeted learning, involves fluctuating the initial fit of the model in a way that maximally adjusts the plug-in estimate per change in the log likelihood. Previously for multidimensional parameters of interest, iterative TMLE's were constructed using locally least favorable submodels as defined in van der Laan and Gruber, 2016, which are indexed by a multidimensional fluctuation parameter. In this paper we define a canonical least favorable submodel in terms of a single dimensional epsilon for a $d$-dimensional parameter of interest. One can view the clfm as the iterative analog to the one-step TMLE as constructed in van der Laan and Gruber, 2016. It is currently implemented in several software packages we provide in the last section. Using a single epsilon for the targeting step in TMLE could be useful for high dimensional parameters, where using a fluctuation parameter of the same dimension as the parameter of interest could suffer the consequences of curse of dimensionality. The clfm also enables placing the so-called clever covariate denominator as an inverse weight in an offset intercept model. It has been shown that such weighting mitigates the effect of large inverse weights sometimes caused by near positivity violations.

2018-10-31 — Causal inference with multiple concurrent medications: A comparison of methods and an application in multidrug-resistant tuberculosis

Authors: Arman Alam Siddique, M. Schnitzer, Asma Bahamyirou, Guanbo Wang, T. Holtz, G. Migliori, G. Sotgiu, N. Gandhi, M. H. Vargas, D. Menzies, A. Benedetti
Year: 2018
Publication Date: 2018-10-31
Venue: Statistical Methods in Medical Research
DOI: 10.1177/0962280218808817
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
This paper investigates different approaches for causal estimation under multiple concurrent medications. Our parameter of interest is the marginal mean counterfactual outcome under different combinations of medications. We explore parametric and non-parametric methods to estimate the generalized propensity score. We then apply three causal estimation approaches (inverse probability of treatment weighting, propensity score adjustment, and targeted maximum likelihood estimation) to estimate the causal parameter of interest. Focusing on the estimation of the expected outcome under the most prevalent regimens, we compare the results obtained using these methods in a simulation study with four potentially concurrent medications. We perform a second simulation study in which some combinations of medications may occur rarely or not occur at all in the dataset. Finally, we apply the methods explored to contrast the probability of patient treatment success for the most prevalent regimens of antimicrobial agents for patients with multidrug-resistant pulmonary tuberculosis.

2018-10-29 — Complier Stochastic Direct Effects: Identification and Robust Estimation

Authors: K. Rudolph, Oleg Sofrygin, M. J. van der Laan
Year: 2018
Publication Date: 2018-10-29
Venue: Journal of the American Statistical Association
DOI: 10.1080/01621459.2019.1704292
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract Mediation analysis is critical to understanding the mechanisms underlying exposure-outcome relationships. In this article, we identify the instrumental variable-direct effect of the exposure on the outcome not through the mediator, using randomization of the instrument. We call this estimand the complier stochastic direct effect (CSDE). To our knowledge, such an estimand has not previously been considered or estimated. We propose and evaluate several estimators for the CSDE: a ratio of inverse-probability of treatment-weighted estimators (IPTW), a ratio of estimating equation estimators (EE), a ratio of targeted minimum loss-based estimators (TMLE), and a TMLE that targets the CSDE directly. These estimators are applicable for a variety of study designs, including randomized encouragement trials, like the Moving to Opportunity housing voucher experiment we consider as an illustrative example, treatment discontinuities, and Mendelian randomization. We found the IPTW estimator to be the most sensitive to finite sample bias, resulting in bias of over 40% even when all models were correctly specified in a sample size of N = 100. In contrast, the EE estimator and TMLE that targets the CSDE directly were far less sensitive. The EE and TML estimators also have advantages in terms of efficiency and reduced reliance on correct parametric model specification. Supplementary materials for this article are available online.

2018-10-06 — Robust variance estimation and inference for causal effect estimation

Authors: Linh Tran, M. Petersen, Joshua Schwab, M. J. van der Laan
Year: 2018
Publication Date: 2018-10-06
Venue: Journal of Causal Inference
DOI: 10.1515/jci-2021-0067
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Abstract We present two novel approaches to variance estimation of semi-parametric efficient point estimators of the treatment-specific mean: (i) a robust approach that directly targets the variance of the influence function (IF) as a counterfactual mean outcome and (ii) a modified non-parametric bootstrap-based approach. The performance of these approaches to variance estimation is compared to variance estimation based on the sample variance of the empirical IF in simulations across different levels of positivity violations and treatment effect sizes. In this article, we focus on estimation of the nuisance parameters using correctly specified parametric models for the treatment mechanism in order to highlight the challenges posed by violation of positivity assumptions (distinct from the challenges posed by non-parametric estimation of the nuisance parameters). Results demonstrate that (1) variance estimation based on the empirical IF may provide highly anti-conservative confidence interval coverage (as reported previously), (2) the proposed robust approach to variance estimation in this setting provides conservative coverage, and (3) the proposed modified bootstrap maintains close to nominal coverage and improves power. In the appendix, we (a) generalize the robust approach of estimating variance to marginal structural working models and (b) provide a proof of the consistency of the targeted minimum loss-based estimation bootstrap.

2018-09-25 — Predicting Outcome of Endovascular Treatment for Acute Ischemic Stroke: Potential Value of Machine Learning Algorithms

Authors: H. V. van Os, L. A. Ramos, A. Hilbert, Matthijs van Leeuwen, M. V. van Walderveen, N. Kruyt, D. Dippel, E. Steyerberg, I. van der Schaaf, Hester F. Lingsma, W. Schonewille, C. Majoie, S. Olabarriaga, K. Zwinderman, E. Venema, H. Marquering, M. Wermer
Year: 2018
Publication Date: 2018-09-25
Venue: Frontiers in Neurology
DOI: 10.3389/fneur.2018.00784
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Background: Endovascular treatment (EVT) is effective for stroke patients with a large vessel occlusion (LVO) of the anterior circulation. To further improve personalized stroke care, it is essential to accurately predict outcome after EVT. Machine learning might outperform classical prediction methods as it is capable of addressing complex interactions and non-linear relations between variables. Methods: We included patients from the Multicenter Randomized Clinical Trial of Endovascular Treatment for Acute Ischemic Stroke in the Netherlands (MR CLEAN) Registry, an observational cohort of LVO patients treated with EVT. We applied the following machine learning algorithms: Random Forests, Support Vector Machine, Neural Network, and Super Learner and compared their predictive value with classic logistic regression models using various variable selection methodologies. Outcome variables were good reperfusion (post-mTICI ≥ 2b) and functional independence (modified Rankin Scale ≤2) at 3 months using (1) only baseline variables and (2) baseline and treatment variables. Area under the ROC-curves (AUC) and difference of mean AUC between the models were assessed. Results: We included 1,383 EVT patients, with good reperfusion in 531 (38%) and functional independence in 525 (38%) patients. Machine learning and logistic regression models all performed poorly in predicting good reperfusion (range mean AUC: 0.53–0.57), and moderately in predicting 3-months functional independence (range mean AUC: 0.77–0.79) using only baseline variables. All models performed well in predicting 3-months functional independence using both baseline and treatment variables (range mean AUC: 0.88–0.91) with a negligible difference of mean AUC (0.01; 95%CI: 0.00–0.01) between best performing machine learning algorithm (Random Forests) and best performing logistic regression model (based on prior knowledge). Conclusion: In patients with LVO machine learning algorithms did not outperform logistic regression models in predicting reperfusion and 3-months functional independence after endovascular treatment. For all models at time of admission radiological outcome was more difficult to predict than clinical outcome.

2018-09-24 — Measures of Maternal Psycho-Social Stress and Biomarkers of Stress Response in the Maternal-Fetal Unit

Authors: Monika A. Izano, L. Cushing, Jue Lin, Susan Fisher, T. Woodruff, R. Morello-Frosch
Year: 2018
Publication Date: 2018-09-24
Venue: ISEE Conference Abstracts
DOI: 10.1289/isesisee.2018.p03.2700
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation, tmle

Abstract:
In a cohort of 500 pregnant women recruited from two San Francisco hospitals serving economically and ethnically diverse populations, we used Targeted Minimum Loss-Based Estimation (TMLE) to evalua...

2018-09-22 — Statistical Learning Methods to Determine Immune Correlates of Herpes Zoster in Vaccine Efficacy Trials

Authors: P. Gilbert, Alexander Luedtke
Year: 2018
Publication Date: 2018-09-22
Venue: Journal of Infectious Diseases
DOI: 10.1093/infdis/jiy421
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Using Super Learner, a machine learning statistical method, we assessed varicella zoster virus-specific glycoprotein-based enzyme-linked immunosorbent assay (gpELISA) antibody titer as an individual-level signature of herpes zoster (HZ) risk in the Zostavax Efficacy and Safety Trial. Gender and pre- and postvaccination gpELISA titers had moderate ability to predict whether a 50-59 year old experienced HZ over 1-2 years of follow-up, with equal classification accuracy (cross-validated area under the receiver operator curve = 0.65) for vaccine and placebo recipients. Previous analyses suggested that fold-rise gpELISA titer is a statistical correlate of protection and supported the hypothesis that it is not a mechanistic correlate of protection. Our results also support this hypothesis.

2018-09-06 — microCLIP super learning framework uncovers functional transcriptome-wide miRNA interactions

Authors: M. Paraskevopoulou, Dimitra Karagkouni, I. Vlachos, Spyros Tastsoglou, A. Hatzigeorgiou
Year: 2018
Publication Date: 2018-09-06
Venue: Nature Communications
DOI: 10.1038/s41467-018-06046-y
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Argonaute crosslinking and immunoprecipitation (CLIP) experiments are the most widely used high-throughput methodologies for miRNA targetome characterization. The analysis of Photoactivatable Ribonucleoside-Enhanced (PAR) CLIP methodology focuses on sequence clusters containing T-to-C conversions. Here, we demonstrate for the first time that the non-T-to-C clusters, frequently observed in PAR-CLIP experiments, exhibit functional miRNA-binding events and strong RNA accessibility. This discovery is based on the analysis of an extensive compendium of bona fide miRNA-binding events, and is further supported by numerous miRNA perturbation experiments and structural sequencing data. The incorporation of these previously neglected clusters yields an average of 14% increase in miRNA-target interactions per PAR-CLIP library. Our findings are integrated in microCLIP (www.microrna.gr/microCLIP), a cutting-edge framework that combines deep learning classifiers under a super learning scheme. The increased performance of microCLIP in CLIP-Seq-guided detection of miRNA interactions, uncovers previously elusive regulatory events and miRNA-controlled pathways. AGO-PAR-CLIP is widely used for high-throughput miRNA target characterization. Here, the authors show that the previously neglected non-T-to-C clusters denote functional miRNA binding events, and develop microCLIP, a super learning framework that accurately detects miRNA interactions.

2018-09-03 — Robust Estimation of Data-Dependent Causal Effects based on Observing a Single Time-Series

Authors: M. Laan, I. Malenica
Year: 2018
Publication Date: 2018-09-03
Venue: arXiv.org
DOI: 10.48550/arXiv.1809.00734
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Consider the case that one observes a single time-series, where at each time t one observes a data record O(t) involving treatment nodes A(t), possible covariates L(t) and an outcome node Y(t). The data record at time t carries information for an (potentially causal) effect of the treatment A(t) on the outcome Y(t), in the context defined by a fixed dimensional summary measure Co(t). We are concerned with defining causal effects that can be consistently estimated, with valid inference, for sequentially randomized experiments without further assumptions. More generally, we consider the case when the (possibly causal) effects can be estimated in a double robust manner, analogue to double robust estimation of effects in the i.i.d. causal inference literature. We propose a general class of averages of conditional (context-specific) causal parameters that can be estimated in a double robust manner, therefore fully utilizing the sequential randomization. We propose a targeted maximum likelihood estimator (TMLE) of these causal parameters, and present a general theorem establishing the asymptotic consistency and normality of the TMLE. We extend our general framework to a number of typically studied causal target parameters, including a sequentially adaptive design within a single unit that learns the optimal treatment rule for the unit over time. Our work opens up robust statistical inference for causal questions based on observing a single time-series on a particular unit.

2018-08-12 — Bagged one‐to‐one matching for efficient and robust treatment effect estimation

Authors: Lauren R. Samuels, R. Greevy
Year: 2018
Publication Date: 2018-08-12
Venue: Statistics in Medicine
DOI: 10.1002/sim.7926
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Observational studies present challenges due to bias from imbalance in baseline confounders. One‐to‐one matching (OOM), a popular cohort‐construction technique for observational studies, reduces bias and provides a compelling basis for inference but generally leads to at least some loss of efficiency due to the exclusion of potentially informative subjects. We introduce the bagged one‐to‐one matching (BOOM) estimator, which combines the bias‐reducing properties of OOM with the variance‐reducing properties of bootstrap aggregation (bagging). We describe the BOOM algorithm in detail, provide R code for its implementation, and investigate its performance in simulation studies and a case study. In the simulation studies, under different types of model misspecification, we compare the BOOM estimator's performance in terms of mean squared error, bias, variance, accuracy of standard error estimation, and coverage of nominal 95% confidence intervals to that of OOM and to that of ordinary least squares estimation, inverse probability weighting, and targeted maximum likelihood estimation, all on the full unmatched cohort. In our simulations, the BOOM estimator achieves as much bias reduction as the estimator based on OOM, while having much lower variance. In all of the settings examined in the simulations, the BOOM's mean squared error is comparable to or better than that of the comparison methods. In the case study, BOOM yields estimates similar to those from the established methods, with narrower 95% confidence intervals.

2018-08-08 — Evaluating Public Health Interventions: 8. Causal Inference for Time-Invariant Interventions

Authors: D. Spiegelman, Xin Zhou
Year: 2018
Publication Date: 2018-08-08
Venue: American Journal of Public Health
DOI: 10.2105/AJPH.2018.304530
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
We provide an overview of classical and newer methods for the control of confounding of time-invariant interventions to permit causal inference in public health evaluations. We estimated the causal effect of gender on all-cause mortality in a large HIV care and treatment program supported by the President’s Emergency Program for AIDS Relief in Dar es Salaam, Tanzania, between 2004 and 2012. We compared results from multivariable modeling, three propensity score methods, inverse-probability weighting, doubly robust methods, and targeted maximum likelihood estimation. Considerable confounding was evident, and, as expected by theory, all methods considered gave the same result, a statistically significant approximately 20% increased mortality rate in men. In general, there is no clear advantage of any of these methods for causal inference over classical multivariable modeling, from the point of view of either bias reduction or efficiency. Rather, given sufficient data to adequately fit the multivariable model to the data, multivariable modeling will yield causal estimates with the greatest statistical efficiency. All methods can adjust only for well-measured confounders-if there are unmeasured or poorly measured confounders, none of these methods will yield causal estimates.

2018-08-01 — The Balance Super Learner: A robust adaptation of the Super Learner to improve estimation of the average treatment effect in the treated based on propensity score matching

Authors: R. Pirracchio, M. Carone
Year: 2018
Publication Date: 2018-08-01
Venue: Statistical Methods in Medical Research
DOI: 10.1177/0962280216682055
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2018-07-16 — Life‐course neighbourhood opportunity and racial‐ethnic disparities in risk of preterm birth

Authors: M. Pearl, J. Ahern, A. Hubbard, B. Laraia, B. Shrimali, Victor Poon, M. Kharrazi
Year: 2018
Publication Date: 2018-07-16
Venue: Paediatric and Perinatal Epidemiology
DOI: 10.1111/ppe.12482
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
BACKGROUND Neighbourhood opportunity, measured by poverty, income and deprivation, has been associated with preterm birth, however little is known about the contribution of early-life and life-course neighbourhood opportunity to preterm birth risk and racial-ethnic disparities. We examined maternal early-life and adult neighbourhood opportunity in relation to risk of preterm birth and racial-ethnic disparities in a population-based cohort of women under age 30. METHODS We linked census tract poverty data to 2 generations of California births from 1982-2011 for 403 315 white, black, or Latina mothers-infant pairs. We estimated the risk of preterm birth, and risk difference (RD) comparing low opportunity (≥20% poverty) in early life or adulthood to high opportunity using targeted maximum likelihood estimation. RESULTS At each time point, low opportunity was related to increased preterm birth risk compared to higher opportunity neighbourhoods for white, black and Latina mothers (RDs 0.3-0.7%). Compared to high opportunity at both time points, risk differences were generally highest for sustained low opportunity (RD 1.5, 1.3, and 0.7% for white, black and Latina mothers, respectively); risk was elevated with downward mobility (RD 0.7, 1.3, and 0.4% for white, black and Latina mothers, respectively), and with upward mobility only among black mothers (RD 1.2%). The black-white preterm birth disparity was reduced by 22% under high life-course opportunity. CONCLUSIONS Early-life and sustained exposure to residential poverty is related to increased PTB risk, particularly among black women, and may partially explain persistent black-white disparities.

2018-07-02 — Longitudinal associations between having an adult child migrant and depressive symptoms among older adults in the Mexican Health and Aging Study.

Authors: Jacqueline M. Torres, K. Rudolph, Oleg Sofrygin, Oleg Sofrygin, M. Glymour, R. Wong
Year: 2018
Publication Date: 2018-07-02
Venue: International Journal of Epidemiology
DOI: 10.1093/ije/dyy112
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Background Migration may impact the mental health of family members who remain in places of origin. We examined longitudinal associations between having an adult child migrant and mental health, for middle-aged and older Mexican adults accounting for complex time-varying confounding. Methods Mexican Health and Aging Study cohort (N = 11 806) respondents ≥50 years completed a 9-item past-week depressive symptoms scale; scores of ≥5 reflected elevated depressive symptoms. Expected risk differences (RD) for elevated depressive symptoms at each wave due to having at least one (versus no) adult child migrant in the US or in another Mexican city were estimated with longitudinal targeted maximum likelihood estimation. Results Women with at least one adult child in the US had a higher adjusted baseline prevalence of elevated depressive symptoms (RD: 0.063, 95% CI: 0.035, 0.091) compared to women with no adult children in the US. Men with at least one child in another Mexican city at all three study waves had a lower adjusted prevalence of elevated depressive symptoms at 11-year follow-up (RD: -0.042, 95% CI: -0.082, -0.003) compared to those with no internal migrant children over those waves. For men and women with ≤3 total children, adverse associations between having an adult child in the US and depressive symptoms persisted beyond baseline. Conclusions Associations between having an adult child migrant and depressive symptoms varied by respondent gender, family size, and the location of the child migrant. Trends in population aging and migration bring new urgency to examining associations with other outcomes and in other settings.

2018-07-01 — Benchmarking deep learning models on large healthcare datasets

Authors: S. Purushotham, Chuizheng Meng, Zhengping Che, Yan Liu
Year: 2018
Publication Date: 2018-07-01
Venue: Journal of Biomedical Informatics
DOI: 10.1016/j.jbi.2018.04.007
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Deep learning models (aka Deep Neural Networks) have revolutionized many fields including computer vision, natural language processing, speech recognition, and is being increasingly used in clinical healthcare applications. However, few works exist which have benchmarked the performance of the deep learning models with respect to the state-of-the-art machine learning models and prognostic scoring systems on publicly available healthcare datasets. In this paper, we present the benchmarking results for several clinical prediction tasks such as mortality prediction, length of stay prediction, and ICD-9 code group prediction using Deep Learning models, ensemble of machine learning models (Super Learner algorithm), SAPS II and SOFA scores. We used the Medical Information Mart for Intensive Care III (MIMIC-III) (v1.4) publicly available dataset, which includes all patients admitted to an ICU at the Beth Israel Deaconess Medical Center from 2001 to 2012, for the benchmarking tasks. Our results show that deep learning models consistently outperform all the other approaches especially when the 'raw' clinical time series data is used as input features to the models.

2018-06-18 — Robust inference on the average treatment effect using the outcome highly adaptive lasso

Authors: Cheng Ju, David C. Benkeser, M. J. van der Laan
Year: 2018
Publication Date: 2018-06-18
Venue: Biometrics
DOI: 10.1111/biom.13121
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso

Abstract:
Many estimators of the average effect of a treatment on an outcome require estimation of the propensity score, the outcome regression, or both. It is often beneficial to utilize flexible techniques, such as semiparametric regression or machine learning, to estimate these quantities. However, optimal estimation of these regressions does not necessarily lead to optimal estimation of the average treatment effect, particularly in settings with strong instrumental variables. A recent proposal addressed these issues via the outcome‐adaptive lasso, a penalized regression technique for estimating the propensity score that seeks to minimize the impact of instrumental variables on treatment effect estimators. However, a notable limitation of this approach is that its application is restricted to parametric models. We propose a more flexible alternative that we call the outcome highly adaptive lasso. We discuss the large sample theory for this estimator and propose closed‐form confidence intervals based on the proposed estimator. We show via simulation that our method offers benefits over several popular approaches.

2018-06-01 — OP0198 Combined effects of tumour necrosis factor inhibitors and nsaids on radiographic progression in ankylosing spondylitis

Authors: L. Gensler, Milena Gianfrancesco, M. H. Weisman, Matthew A. Brown, Minjae Lee, Thomas Learch, M. Rahbar, J. Reveille, M. Ward
Year: 2018
Publication Date: 2018-06-01
Venue: THURSDAY, 14 JUNE 2018
DOI: 10.1136/annrheumdis-2018-eular.4027
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Background The potential of TNFi or NSAIDs to reduce radiographic progression in AS is uncertain and causal effects of both exposures on radiographic progression have not been convincingly demonstrated. In addition, no study has evaluated whether effects are comparable among different NSAIDs in this setting. Objectives The objective of this study was to explore causal effects of NSAIDs and TNFi on radiographic progression in Ankylosing Spondylitis (AS) and to compare effects of celecoxib to other NSAIDs. Methods We included all patients meeting the modified New York criteria in a prospective cohort with at least 4 years of clinical and radiographic follow up. Clinical and medication data were collected every 6 months and radiographs were performed at baseline and every 2 years. We used longitudinal targeted maximum likelihood estimation to estimate the causal effect of TNFi and NSAIDs (using the NSAID index) on radiographic progression as measured by the modified Stoke Ankylosing Spondylitis Spine Score (mSASSS) at 2 and 4 years, accounting for time-varying covariates. We controlled for sex, race/ethnicity, education, symptom duration, enrollment year, number of years on TNFi, symptom duration at time of TNFi start, baseline mSASSS, ASDAS-CRP, current smoking, and missed visit status.Abstract OP0198 – Table 1 TNF use No TNF use Mean Difference P-value Comparing TNF use vs no TNF use, given no NSAID (=0) at time t : mSASSS @ 2 years 13.94 14.92 −0.98 (-2.77, 0.81) 0.28 mSASSS @ 4 years 16.12 15.62 0.50 (-0.63, 1.64) 0.38 Comparing TNF use vs no TNF use, given low NSAID (>0 and <50) at time t : mSASSS @ 2 years 15.43 15.49 −0.06 (-1.63, 1.51) 0.94 mSASSS @ 4 years 15.52 16.76 −1.24 (-1.80,–0.68) <0.001 Comparing TNF use vs no TNF use, given high NSAID (>=50) at time t : mSASSS @ 2 years 14.79 15.13 −0.34 (-1.46, 0.78) 0.56 mSASSS @ 4 years 14.17 17.47 −3.31 (-4.02,–2.59) <0.001 Comparing TNF use vs no TNF use, NSAID=celecoxib at time t : mSASSS @ 2 years 11.63 15.62 −3.98 (-4.51,–3.45) <0.001 mSASSS @ 4 years 14.37 19.06 −4.69 (-5.08,–4.30) <0.001 Results Of the 519 patients, 75% were male with a baseline mean (SD) age and symptom duration of 41.4 (13.2) and 16.8 (12.5) years respectively. The baseline mean (SD) mSASSS was 14.2 (19.6). At baseline, NSAIDs were used in 66% of patients, of which ½ used an index <50 and ½ an index ≥50). TNFi were used in 46% of patients at baseline. In the setting of TNFi use, the addition of NSAID therapy was associated with less radiographic progression in a dose-related manner at 4 years. When NSAID specific effects were examined, celecoxib in combination with TNFi use was associated with the greatest reduction in radiographic progression and this was significant at both 2 and 4 years (table 1). Conclusions Dose related use of NSAIDs together with TNFi in AS patients has a synergistic effect in slowing radiographic progression with the greatest effect in those using both high-dose NSAIDs and TNFi. Celecoxib appears to confer the greatest benefit in decreasing progression with effect at both 2 and 4 years. Disclosure of Interest L. Gensler Grant/research support from: Amgen, AbbVie, UCB, Consultant for: Janssen, Lilly, Novartis, M. Gianfrancesco: None declared, M. Weisman Consultant for: Celltrion, Baylx, Novartis, Lilly, GSK, M. Brown Grant/research support from: Abbvie, Janssen, UCB, Leo Pharma, Consultant for: Abbvie, Janssen, Pfizer, Speakers bureau: Abbvie, UCB, Pfizer, M. Lee: None declared, T. Learch: None declared, M. Rahbar: None declared, J. Reveille Grant/research support from: Lilly UCB, Consultant for: Novartis Janssen Lilly UCB, M. Ward: None declared

2018-05-30 — Faculty Opinions recommendation of Targeted maximum likelihood estimation for a binary treatment: A tutorial.

Authors: R. Platt
Year: 2018
Publication Date: 2018-05-30
Venue: Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature
DOI: 10.3410/f.733103952.793546483
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2018-05-28 — A Replication Study: Just-in-Time Defect Prediction with Ensemble Learning

Authors: Steven Young, T. Abdou, A. Bener
Year: 2018
Publication Date: 2018-05-28
Venue: 2018 IEEE/ACM 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE)
DOI: 10.1145/3194104.3194110
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Just-in-time defect prediction, which is also known as change-level defect prediction, can be used to efficiently allocate resources and manage project schedules in the software testing and debugging process. Just-in-time defect prediction can reduce the amount of code to review and simplify the assignment of developers to bug fixes. This paper reports a replicated experiment and an extension comparing the prediction of defect-prone changes using traditional machine learning techniques and ensemble learning. Using datasets from six open source projects, namely Bugzilla, Columba, JDT, Platform, Mozilla, and PostgreSQL we replicate the original approach to verify the results of the original experiment and use them as a basis for comparison for alternatives in the approach. Our results from the replicated experiment are consistent with the original. The original approach uses a combination of data preprocessing and a two-layer ensemble of decision trees. The first layer uses bagging to form multiple random forests. The second layer stacks the forests together with equal weights. Generalizing the approach to allow the use of any arbitrary set of classifiers in the ensemble, optimizing the weights of the classifiers, and allowing additional layers, we apply a new deep ensemble approach, called deep super learner, to test the depth of the original study. The deep super learner achieves statistically significantly better results than the original approach on five of the six projects in predicting defects as measured by F1 score.

2018-05-21 — Super learning in the SAS system

Authors: A. Keil
Year: 2018
Publication Date: 2018-05-21
Venue: arXiv.org
Link: Semantic Scholar
Matched Keywords: super learner, super learning

Abstract:
Background and objective: Stacking is an ensemble machine learning method that averages predictions from multiple other algorithms, such as generalized linear models and regression trees. A recent iteration of stacking, called super learning, has been developed as a general approach to black box supervised learning and has seen frequent usage, in part due to the availability of an R package. I develop super learning in the SAS software system using a new macro, and demonstrate its performance relative to the R package. Methods: I follow closely previous work using the R SuperLearner package and assess the performance of super learning in a number of domains. I compare the R package with the new SAS macro in a small set of simulations assessing curve fitting in a prediction model, a set of 14 publicly available datasets to assess cross-validated, expected loss, and data from a randomized trial of job seekers' training to assess the utility of super learning in causal inference using inverse probability weighting. Results: Across the simulated data and the publicly available data, the macro performed similarly to the R package, even with a different set of potential algorithms available natively in R and SAS. The example with inverse probability weighting demonstrated the ability of the SAS macro to include algorithms developed in R. Conclusions: The super learner macro performs as well as the R package at a number of tasks. Further, by extending the macro to include the use of R packages, the macro can leverage both the robust, enterprise oriented procedures in SAS and the nimble, cutting edge packages in R. In the spirit of ensemble learning, this macro extends the potential library of algorithms beyond a single software system and provides a simple avenue into machine learning in SAS.

2018-05-15 — Comparing Costs of Traditional and Specialty Probation for People With Serious Mental Illness.

Authors: Jennifer L. Skeem, Lina Montoya, Sarah M. Manchak
Year: 2018
Publication Date: 2018-05-15
Venue: Psychiatric Services
DOI: 10.1176/appi.ps.201700498
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
OBJECTIVE Specialty mental health probation reduces the likelihood of rearrest for people with mental illness, who are overrepresented in the justice system. This study tested whether specialty probation was associated with lower costs than traditional probation during the two years after placement in probation. METHODS A longitudinal, matched study compared costs of behavioral health care and criminal justice contacts among 359 probationers with mental illness at prototypic specialty or traditional agencies. Compared with traditional officers, specialty officers supervised smaller caseloads, established better relationships with supervisees, and participated more in treatment. Participants and officers were interviewed, and administrative databases were integrated to capture service use and criminal justice contacts. Unit costs were attached to these data to estimate costs incurred by each participant over two years. Cost differences were estimated by using machine-learning algorithms combined with targeted maximum-likelihood estimation (TMLE), a double-robust estimator that accounts for associations between confounders and both treatment assignment and outcomes. RESULTS Specialty probation cost $11,826 (p<.001) less per participant than traditional probation, with overall savings of about 51%. Specialty and traditional probation did not differ in criminal justice costs because the additional costs for supervision of specialty caseloads were offset by reduced recidivism. However, for behavioral health care, specialty probation cost an estimated $14,049 (p<.001) less per client than traditional probation. Greater outpatient costs were more than offset by reduced emergency, inpatient, and residential costs. CONCLUSIONS Well-implemented specialty probation yielded substantial savings-and should be considered in justice reform efforts for people with mental illness.

2018-05-01 — Marginal Structural Models with Counterfactual Effect Modifiers

Authors: Wenjing Zheng, Zhehui Luo, M. J. van der Laan
Year: 2018
Publication Date: 2018-05-01
Venue: The International Journal of Biostatistics
DOI: 10.1515/ijb-2018-0039
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Abstract In health and social sciences, research questions often involve systematic assessment of the modification of treatment causal effect by patient characteristics. In longitudinal settings, time-varying or post-intervention effect modifiers are also of interest. In this work, we investigate the robust and efficient estimation of the Counterfactual-History-Adjusted Marginal Structural Model (van der Laan MJ, Petersen M. Statistical learning of origin-specific statically optimal individualized treatment rules. Int J Biostat. 2007;3), which models the conditional intervention-specific mean outcome given a counterfactual modifier history in an ideal experiment. We establish the semiparametric efficiency theory for these models, and present a substitution-based, semiparametric efficient and doubly robust estimator using the targeted maximum likelihood estimation methodology (TMLE, e.g. van der Laan MJ, Rubin DB. Targeted maximum likelihood learning. Int J Biostat. 2006;2, van der Laan MJ, Rose S. Targeted learning: causal inference for observational and experimental data, 1st ed. Springer Series in Statistics. Springer, 2011). To facilitate implementation in applications where the effect modifier is high dimensional, our third contribution is a projected influence function (and the corresponding projected TMLE estimator), which retains most of the robustness of its efficient peer and can be easily implemented in applications where the use of the efficient influence function becomes taxing. We compare the projected TMLE estimator with an Inverse Probability of Treatment Weighted estimator (e.g. Robins JM. Marginal structural models. In: Proceedings of the American Statistical Association. Section on Bayesian Statistical Science, 1-10. 1997a, Hernan MA, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology. 2000;11:561–570), and a non-targeted G-computation estimator (Robins JM. A new approach to causal inference in mortality studies with sustained exposure periods - application to control of the healthy worker survivor effect. Math Modell. 1986;7:1393–1512.). The comparative performance of these estimators is assessed in a simulation study. The use of the projected TMLE estimator is illustrated in a secondary data analysis for the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) trial where effect modifiers are subject to missing at random.

2018-04-23 — Targeted maximum likelihood estimation for a binary treatment: A tutorial

Authors: M. Luque-Fernández, M. Schomaker, B. Rachet, M. Schnitzer
Year: 2018
Publication Date: 2018-04-23
Venue: Statistics in Medicine
DOI: 10.1002/sim.7628
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
When estimating the average effect of a binary treatment (or exposure) on an outcome, methods that incorporate propensity scores, the G‐formula, or targeted maximum likelihood estimation (TMLE) are preferred over naïve regression approaches, which are biased under misspecification of a parametric outcome model. In contrast propensity score methods require the correct specification of an exposure model. Double‐robust methods only require correct specification of either the outcome or the exposure model. Targeted maximum likelihood estimation is a semiparametric double‐robust method that improves the chances of correct model specification by allowing for flexible estimation using (nonparametric) machine‐learning methods. It therefore requires weaker assumptions than its competitors. We provide a step‐by‐step guided implementation of TMLE and illustrate it in a realistic scenario based on cancer epidemiology where assumptions about correct model specification and positivity (ie, when a study participant had 0 probability of receiving the treatment) are nearly violated. This article provides a concise and reproducible educational introduction to TMLE for a binary outcome and exposure. The reader should gain sufficient understanding of TMLE from this introductory tutorial to be able to apply the method in practice. Extensive R‐code is provided in easy‐to‐read boxes throughout the article for replicability. Stata users will find a testing implementation of TMLE and additional material in the Appendix S1 and at the following GitHub repository: https://github.com/migariane/SIM‐TMLE‐tutorial

2018-04-01 — Machine Learning Based Predictive Maintenance Strategy: A Super Learning Approach with Deep Neural Networks

Authors: Sujata Butte, Prashanth A R, Sainath Patil
Year: 2018
Publication Date: 2018-04-01
Venue: Workshop on Microelectronics and Electron Devices
DOI: 10.1109/WMED.2018.8360836
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Abstract unavailable from Semantic Scholar.

2018-04-01 — Flexible Model Selection for Mechanistic Network Models via Super Learner

Authors: Sixing Chen, A. Mira, J. Onnela
Year: 2018
Publication Date: 2018-04-01
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Application of network models can be found in many domains due to the variety of data that can be represented as a network. Two prominent paradigms for modeling networks are statistical models (probabilistic models for the final observed network) and mechanistic models (models for network growth and evolution over time). Mechanistic models are easier to incorporate domain knowledge with, to study effects of interventions and to forward simulate, but typically have intractable likelihoods. As such, and in a stark contrast to statistical models, there is a dearth of work on model selection for such models, despite the otherwise large body of extant work. In this paper, we propose a procedure for mechanistic network model selection that makes use of the Super Learner framework and borrows aspects from Approximate Bayesian Computation, along with a means to quantify the uncertainty in the selected model. Our approach takes advantage of the ease to forward simulate from these models, while circumventing their intractable likelihoods at the same time. The overall process is very flexible and widely applicable. Our simulation results demonstrate the approach's ability to accurately discriminate between competing mechanistic models.

2018-04-01 — Flexible model selection for mechanistic network models

Authors: Sixing Chen, A. Mira, J. Onnela
Year: 2018
Publication Date: 2018-04-01
Venue: J. Complex Networks
DOI: 10.1093/COMNET/CNZ024
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Network models are applied across many domains where data can be represented as a network. Two prominent paradigms for modelling networks are statistical models (probabilistic models for the observed network) and mechanistic models (models for network growth and/or evolution). Mechanistic models are better suited for incorporating domain knowledge, to study effects of interventions (such as changes to specific mechanisms) and to forward simulate, but they typically have intractable likelihoods. As such, and in a stark contrast to statistical models, there is a relative dearth of research on model selection for such models despite the otherwise large body of extant work. In this article, we propose a simulator-based procedure for mechanistic network model selection that borrows aspects from Approximate Bayesian Computation along with a means to quantify the uncertainty in the selected model. To select the most suitable network model, we consider and assess the performance of several learning algorithms, most notably the so-called Super Learner, which makes our framework less sensitive to the choice of a particular learning algorithm. Our approach takes advantage of the ease to forward simulate from mechanistic network models to circumvent their intractable likelihoods. The overall process is flexible and widely applicable. Our simulation results demonstrate the approach's ability to accurately discriminate between competing mechanistic models. Finally, we showcase our approach with a protein-protein interaction network model from the literature for yeast (Saccharomyces cerevisiae).

2018-03-31 — Collaborative targeted inference from continuously indexed nuisance parameter estimators

Authors: Cheng Ju, A. Chambaz, M. J. Laan
Year: 2018
Publication Date: 2018-03-31
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Suppose that we wish to infer the value of a statistical parameter at a law from which we sample independent observations. Suppose that this parameter is smooth and that we can define two variation-independent, infinite-dimensional features of the law, its so called Q-and G-components (comp.), such that if we estimate them consistently at a fast enough product of rates, then we can build a confidence interval (CI) with a given asymptotic level based on a plain targeted minimum loss estimator (TMLE). The estimators of the Q-and G-comp. would typically be by products of machine learning algorithms. We focus on the case that the machine learning algorithm for the G-comp. is fine-tuned by a real-valued parameter h. Then, a plain TMLE with an h chosen by cross-validation would typically not lend itself to the construction of a CI, because the selection of h would trade-off its empirical bias with something akin to the empirical variance of the estimator of the G-comp. as opposed to that of the TMLE. A collaborative TMLE (C-TMLE) might, however, succeed in achieving the relevant trade-off. We prove that this is the case indeed. We construct a C-TMLE and show that, under high-level empirical processes conditions, and if there exists an oracle h that makes a bulky remainder term asymptotically Gaussian, then the C-TMLE is asymptotically Gaussian hence amenable to building a CI provided that its asymptotic variance can be estimated too. The construction hinges on guaranteeing that an additional, well chosen estimating equation is solved on top of the estimating equation that a plain TMLE solves. The optimal h is chosen by cross-validating an empirical criterion that guarantees the wished trade-off between empirical bias and variance. We illustrate the construction and main result with the inference of the so called average treatment effect, where the Q-comp. consists in a marginal law and a conditional expectation, and the G-comp. is a propensity score (a conditional probability). We also conduct a multi-faceted simulation study to investigate the empirical properties of the collaborative TMLE when the G-comp. is estimated by the LASSO. Here, h is the bound on the 1-norm of the candidate coefficients. The variety of scenarios shed light on small and moderate sample properties, in the face of low-, moderate-or high-dimensional baseline covariates, and possibly positivity violation.

2018-03-27 — Soil-pipe interaction modeling for pipe behavior prediction with super learning based methods

Authors: Fang Shi, Xiang Peng, Huan Liu, Yafei Hu, Zheng Liu, Eric Li
Year: 2018
Publication Date: 2018-03-27
Venue: Smart Structures and Materials + Nondestructive Evaluation and Health Monitoring
DOI: 10.1117/12.2300812
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Abstract unavailable from Semantic Scholar.

2018-03-06 — Deep Super Learner: A Deep Ensemble for Classification Problems

Authors: Steven Young, T. Abdou, A. Bener
Year: 2018
Publication Date: 2018-03-06
Venue: Canadian Conference on AI
DOI: 10.1007/978-3-319-89656-4_7
Link: Semantic Scholar
Matched Keywords: super learner, super learning

Abstract:
Deep learning has become very popular for tasks such as predictive modeling and pattern recognition in handling big data. Deep learning is a powerful machine learning method that extracts lower level features and feeds them forward for the next layer to identify higher level features that improve performance. However, deep neural networks have drawbacks, which include many hyper-parameters and infinite architectures, opaqueness into results, and relatively slower convergence on smaller datasets. While traditional machine learning algorithms can address these drawbacks, they are not typically capable of the performance levels achieved by deep neural networks. To improve performance, ensemble methods are used to combine multiple base learners. Super learning is an ensemble that finds the optimal combination of diverse learning algorithms. This paper proposes deep super learning as an approach which achieves log loss and accuracy results competitive to deep neural networks while employing traditional machine learning algorithms in a hierarchical structure. The deep super learner is flexible, adaptable, and easy to train with good performance across different tasks using identical hyper-parameter values. Using traditional machine learning requires fewer hyper-parameters, allows transparency into results, and has relatively fast convergence on smaller datasets. Experimental results show that the deep super learner has superior performance compared to the individual base learners, single-layer ensembles, and in some cases deep neural networks. Performance of the deep super learner may further be improved with task-specific tuning.

2018-03-01 — Super Learning as a Strategy to Improve of Teaching Practice in Higher Education Institutions in Engineering

Authors: Mara E. ngel Ferrer, W. F. Silva, Remedios Pitre Redondo, Meredith Jimnez Crdenas, David A. Franco Borr
Year: 2018
Publication Date: 2018-03-01
DOI: 10.17485/IJST/2018/V11I9/119090
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Abstract unavailable from Semantic Scholar.

2018-02-26 — One-step Targeted Maximum Likelihood for Time-to-event Outcomes

Authors: Weixin Cai, M. Laan
Year: 2018
Publication Date: 2018-02-26
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Current targeted maximum likelihood estimation methods used to analyze time to event data estimates the survival probability for each time point separately, which result in estimates that are not necessarily monotone. In this paper, we present an extension of Targeted Maximum Likelihood Estimator (TMLE) for observational time to event data, the one-step Targeted Maximum Likelihood Estimator for the treatment- rule specific survival curve. We construct a one-dimensional universal least favorable submodel that targets the entire survival curve, and thereby requires minimal extra fitting with data to achieve its goal of solving the efficient influence curve equation. Through the use of a simulation study we will show that this method improves on previously proposed methods in both robustness and efficiency, and at the same time respects the monotone decreasing nature of the survival curve.

2018-02-14 — Using longitudinal targeted maximum likelihood estimation in complex settings with dynamic interventions

Authors: M. Schomaker, M. Luque-Fernández, V. Leroy, M. Davies
Year: 2018
Publication Date: 2018-02-14
Venue: Statistics in Medicine
DOI: 10.1002/sim.8340
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Longitudinal targeted maximum likelihood estimation (LTMLE) has very rarely been used to estimate dynamic treatment effects in the context of time‐dependent confounding affected by prior treatment when faced with long follow‐up times, multiple time‐varying confounders, and complex associational relationships simultaneously. Reasons for this include the potential computational burden, technical challenges, restricted modeling options for long follow‐up times, and limited practical guidance in the literature. However, LTMLE has desirable asymptotic properties, ie, it is doubly robust, and can yield valid inference when used in conjunction with machine learning. It also has the advantage of easy‐to‐calculate analytic standard errors in contrast to the g‐formula, which requires bootstrapping. We use a topical and sophisticated question from HIV treatment research to show that LTMLE can be used successfully in complex realistic settings, and we compare results to competing estimators. Our example illustrates the following practical challenges common to many epidemiological studies: (1) long follow‐up time (30 months); (2) gradually declining sample size; (3) limited support for some intervention rules of interest; (4) a high‐dimensional set of potential adjustment variables, increasing both the need and the challenge of integrating appropriate machine learning methods; and (5) consideration of collider bias. Our analyses, as well as simulations, shed new light on the application of LTMLE in complex and realistic settings: We show that (1) LTMLE can yield stable and good estimates, even when confronted with small samples and limited modeling options; (2) machine learning utilized with a small set of simple learners (if more complex ones cannot be fitted) can outperform a single, complex model, which is tailored to incorporate prior clinical knowledge; and (3) performance can vary considerably depending on interventions and their support in the data, and therefore critical quality checks should accompany every LTMLE analysis. We provide guidance for the practical application of LTMLE.

2018-02-08 — Data-adaptive doubly robust instrumental variable methods for treatment effect heterogeneity

Authors: Karla DiazOrdaz, R. Daniel, N. Kreif
Year: 2018
Publication Date: 2018-02-08
Link: Semantic Scholar
Matched Keywords: super learner, tmle

Abstract:
We consider the estimation of the average treatment effect in the treated as a function of baseline covariates, where there is a valid (conditional) instrument. We describe two doubly robust (DR) estimators: a locally efficient g-estimator, and a targeted minimum loss-based estimator (TMLE). These two DR estimators can be viewed as generalisations of the two-stage least squares (TSLS) method to semi-parametric models that make weaker assumptions. We exploit recent theoretical results that extend to the g-estimator the use of data-adaptive fits for the nuisance parameters. A simulation study is used to compare standard TSLS with the two DR estimators' finite-sample performance, (1) when fitted using parametric nuisance models, and (2) using data-adaptive nuisance fits, obtained from the Super Learner, an ensemble machine learning method. Data-adaptive DR estimators have lower bias and improved coverage, when compared to incorrectly specified parametric DR estimators and TSLS. When the parametric model for the treatment effect curve is correctly specified, the g-estimator outperforms all others, but when this model is misspecified, TMLE performs best, while TSLS can result in large biases and zero coverage. Finally, we illustrate the methods by reanalysing the COPERS (COping with persistent Pain, Effectiveness Research in Self-management) trial to make inference about the causal effect of treatment actually received, and the extent to which this is modified by depression at baseline.

2018-01-18 — Prediction of absolute risk of acute graft-versus-host disease following hematopoietic cell transplantation

Authors: Catherine Lee, S. Haneuse, Hai-lin Wang, Sherri Rose, S. Spellman, M. Verneris, K. Hsu, K. Fleischhauer, Stephanie J. Lee, R. Abdi
Year: 2018
Publication Date: 2018-01-18
Venue: PLoS ONE
DOI: 10.1371/journal.pone.0190610
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Allogeneic hematopoietic cell transplantation (HCT) is the treatment of choice for a variety of hematologic malignancies and disorders. Unfortunately, acute graft-versus-host disease (GVHD) is a frequent complication of HCT. While substantial research has identified clinical, genetic and proteomic risk factors for acute GVHD, few studies have sought to develop risk prediction tools that quantify absolute risk. Such tools would be useful for: optimizing donor selection; guiding GVHD prophylaxis, post-transplant treatment and monitoring strategies; and, recruitment of patients into clinical trials. Using data on 9,651 patients who underwent first allogeneic HLA-identical sibling or unrelated donor HCT between 01/1999-12/2011 for treatment of a hematologic malignancy, we developed and evaluated a suite of risk prediction tools for: (i) acute GVHD within 100 days post-transplant and (ii) a composite endpoint of acute GVHD or death within 100 days post-transplant. We considered two sets of inputs: (i) clinical factors that are typically readily-available, included as main effects; and, (ii) main effects combined with a selection of a priori specified two-way interactions. To build the prediction tools we used the super learner, a recently developed ensemble learning statistical framework that combines results from multiple other algorithms/methods to construct a single, optimal prediction tool. Across the final super learner prediction tools, the area-under-the curve (AUC) ranged from 0.613–0.640. Improving the performance of risk prediction tools will likely require extension beyond clinical factors to include biological variables such as genetic and proteomic biomarkers, although the measurement of these factors may currently not be practical in standard clinical settings.

2018 — Utilization of Propensity Score Weighting and Targeted Maximum Likelihood Estimation in the Post Hoc Analysis of a Randomized Controlled Trial for the Treatment of Cocaine Dependence (NIDA-MDS-Modafinil-0001)

Authors: Weiwei Shan
Year: 2018
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2018 — Using Super Learner Prediction Modeling to Improve High-dimensional Propensity Score Estimation

Authors: R. Wyss, S. Schneeweiss, M. J. van der Laan, S. Lendle, Cheng Ju, J. Franklin
Year: 2018
Venue: Epidemiology
DOI: 10.1097/EDE.0000000000000762
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2018 — Two-phase Targeted Maximum Likelihood Estimation for Mixed Data Meta-Analysis

Authors: Arman Alam Siddique
Year: 2018
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2018 — Sequential Super Learning

Authors: Sherri Rose, M. J. Laan
Year: 2018
DOI: 10.1007/978-3-319-65304-4_3
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Abstract unavailable from Semantic Scholar.

2018 — Online Super Learning

Authors: M. J. Laan, David C. Benkeser
Year: 2018
DOI: 10.1007/978-3-319-65304-4_18
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Abstract unavailable from Semantic Scholar.

2018 — One-Step TMLE

Authors: M. V. D. Laan, Wilson Cai, Susan Gruber
Year: 2018
DOI: 10.1007/978-3-319-65304-4_5
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract unavailable from Semantic Scholar.

2018 — Multi-Label Super Learner: Multi-Label Classification and Improving Its Performance Using Heterogenous Ensemble Methods

Authors: Yujue Wu
Year: 2018
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2018 — Highly Adaptive Lasso (HAL)

Authors: M. J. Laan, David C. Benkeser
Year: 2018
DOI: 10.1007/978-3-319-65304-4_6
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso

Abstract:
Abstract unavailable from Semantic Scholar.

2018 — CV-TMLE for Nonpathwise Differentiable Target Parameters

Authors: M. Laan, Aurélien F. Bibaut, Alexander Luedtke
Year: 2018
DOI: 10.1007/978-3-319-65304-4_25
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract unavailable from Semantic Scholar.

2018 — Collaborative Targeted Maximum Likelihood Estimation to Assess Causal Effects in Observational Studies

Authors: Susan Gruber, M. J. Laan
Year: 2018
DOI: 10.1007/978-981-10-7826-2_1
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2018 — Collaborative targeted maximum likelihood estimation for variable importance measure: Illustration for functional outcome prediction in mild traumatic brain injuries

Authors: R. Pirracchio, J. Yue, G. Manley, M. J. van der Laan, A. Hubbard
Year: 2018
Venue: Statistical Methods in Medical Research
DOI: 10.1177/0962280215627335
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2018 — C-TMLE for Continuous Tuning

Authors: M. J. Laan, A. Chambaz, Cheng Ju
Year: 2018
DOI: 10.1007/978-3-319-65304-4_10
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract unavailable from Semantic Scholar.

2018 — A Generally Efficient HAL-TMLE

Authors: M. J. Laan
Year: 2018
DOI: 10.1007/978-3-319-65304-4_7
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract unavailable from Semantic Scholar.

2017 (33 papers)
2017-12-27 — Estimating the Causal Impact of Proximity to Gold and Copper Mines on Respiratory Diseases in Chilean Children: An Application of Targeted Maximum Likelihood Estimation

Authors: R. Herrera, U. Berger, O. V. von Ehrenstein, I. Díaz, S. Huber, Daniel Moraga Muñoz, K. Radon
Year: 2017
Publication Date: 2017-12-27
Venue: International Journal of Environmental Research and Public Health
DOI: 10.3390/ijerph15010039
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
In a town located in a desert area of Northern Chile, gold and copper open-pit mining is carried out involving explosive processes. These processes are associated with increased dust exposure, which might affect children’s respiratory health. Therefore, we aimed to quantify the causal attributable risk of living close to the mines on asthma or allergic rhinoconjunctivitis risk burden in children. Data on the prevalence of respiratory diseases and potential confounders were available from a cross-sectional survey carried out in 2009 among 288 (response: 69%) children living in the community. The proximity of the children’s home addresses to the local gold and copper mine was calculated using geographical positioning systems. We applied targeted maximum likelihood estimation to obtain the causal attributable risk (CAR) for asthma, rhinoconjunctivitis and both outcomes combined. Children living more than the first quartile away from the mines were used as the unexposed group. Based on the estimated CAR, a hypothetical intervention in which all children lived at least one quartile away from the copper mine would decrease the risk of rhinoconjunctivitis by 4.7 percentage points (CAR: −4.7; 95% confidence interval (95% CI): −8.4; −0.11); and 4.2 percentage points (CAR: −4.2; 95% CI: −7.9;−0.05) for both outcomes combined. Overall, our results suggest that a hypothetical intervention intended to increase the distance between the place of residence of the highest exposed children would reduce the prevalence of respiratory disease in the community by around four percentage points. This approach could help local policymakers in the development of efficient public health strategies.

2017-12-16 — Prediction of NB-UVB phototherapy treatment response of psoriasis patients using data mining

Authors: S. Mohamed, B. Huang, Mohand Tahar Kechadi
Year: 2017
Publication Date: 2017-12-16
Venue: IEEE International Conference on Bioinformatics and Biomedicine
DOI: 10.1109/BIBM.2017.8217804
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
NB-UVB Phototherapy is one of the most common treatments administrated by dermatologists for psoriasis patients. Although in general, the treatment results in improving the condition, it also can worsen it. If a model can predict the treatment response before hand, the dermatologists can adjust the treatment accordingly. In this paper, we use data mining techniques and conduct four experiments. The best performance of all four experiments was obtained by the stacked classifier made of hyper parameter tuned Random Forest, kSVM and ANN base learners, learned using L1-Regularized Logistic Regression super learner.

2017-12-08 — Acetazolamide Suppresses Multi-Drug Resistance-Related Protein 1 and P-Glycoprotein Expression by Inhibiting Aquaporins Expression in a Mesial Temporal Epilepsy Rat Model

Authors: Lei Duan, Qing Di
Year: 2017
Publication Date: 2017-12-08
Venue: Medical Science Monitor
DOI: 10.12659/MSM.903855
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Background Mesial temporal epilepsy (MTLE) is the most common type of focal epilepsy in adults, and is often drug-resistant. This study investigated the effects of aquaporins (AQP) inhibitor on multi-drug-resistant protein expression in an MTLE rat model. Material/Methods The MTLE rat model was established by injecting pilocarpine into rats. The MTLE rats were divided into an MTLE-6 h group, an MTLE-12 h group, and an MTLE-24 h group, together with a normal saline group (NS), to examine the AQP4 expression by using Western blot assay and immunohistochemistry assay. The other 18 MTLE model rats were used to observe the effects of the AQP4 inhibitor, acetazolamide, on the multi-drug-resistant protein 1 (MRP1) and P-glycoprotein (Pgp) by using Western blot and immunohistochemistry assays, respectively. Results AQP4 expression was enhanced in hippocampal tissues of MTLE model rats compared to NS rats (P<0.05). More positively stained AQP4 was discovered in hippocampal tissues of MTLE model rats. AQP4 inhibitor significantly decreased multi-drug-resistant protein MRP1 and Pgp expression in the AQP4 inhibitor Interfere group and the AQP4 inhibitor Therapy group compared to the TMLE model group (P<0.05). Conclusions The present findings confirm that the AQP4 inhibitor, acetazolamide, effectively inhibits the multi-drug-resistant protein, MRP1, and Pgp, in the MTLE rat model.

2017-11-06 — Formación Docente en Técnicas de Superaprendizaje Aplicadas a La Enseñanza de la Matemática en la Educación Secundaria

Authors: Avilner Rafael Páez Pereira
Year: 2017
Publication Date: 2017-11-06
DOI: 10.29394/SCIENTIFIC.ISSN.2542-2987.2017.2.6.1.10-28
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
The purpose of the study was to train LB "Jose Veliz" teacher for the teaching of mathematics through the application of super-learning techniques, based on the Research Participatory Action modality, proposed by Lopez de Ceballos, (2008), following the model of the Lewin cycles of action (1946), quoted by Latorre (2007), based on the theories of humanism, Martinez (2009); multiple intelligence, Armstrong (2006); the Super learning of Sambrano and Stainer, (2003). Within the framework of the Critical - Social paradigm, in the type Qualitative Research, a plan of approach to the group was made, where through brainstorming and informal interviews the main problems were listed, which were hierarchized and then carried out an awareness - raising process. formulation of an overall plan of action. Among the results were 6 training workshops on techniques of breathing, relaxation, aromatherapy, music therapy, positive programming, color in the classroom, song in mathematical algorithms, in which processes of reflection were established on the benefits or obstacles obtained in the application of these in the transformation of the educational reality, elaborating a didactic strategy product of the experiences reached.

2017-10-23 — Benchmark of Deep Learning Models on Large Healthcare MIMIC Datasets

Authors: S. Purushotham, Chuizheng Meng, Zhengping Che, Yan Liu
Year: 2017
Publication Date: 2017-10-23
Venue: arXiv.org
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Deep learning models (aka Deep Neural Networks) have revolutionized many fields including computer vision, natural language processing, speech recognition, and is being increasingly used in clinical healthcare applications. However, few works exist which have benchmarked the performance of the deep learning models with respect to the state-of-the-art machine learning models and prognostic scoring systems on publicly available healthcare datasets. In this paper, we present the benchmarking results for several clinical prediction tasks such as mortality prediction, length of stay prediction, and ICD-9 code group prediction using Deep Learning models, ensemble of machine learning models (Super Learner algorithm), SAPS II and SOFA scores. We used the Medical Information Mart for Intensive Care III (MIMIC-III) (v1.4) publicly available dataset, which includes all patients admitted to an ICU at the Beth Israel Deaconess Medical Center from 2001 to 2012, for the benchmarking tasks. Our results show that deep learning models consistently outperform all the other approaches especially when the `raw' clinical time series data is used as input features to the models.

2017-10-16 — ltmle: An R Package Implementing Targeted Minimum Loss-Based Estimation for Longitudinal Data

Authors: S. Lendle, Joshua Schwab, M. Petersen, M. Laan
Year: 2017
Publication Date: 2017-10-16
DOI: 10.18637/JSS.V081.I01
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
In recent years, targeted minimum loss-based estimation methodology has been used to develop estimators of parameters in longitudinal data structures (Gruber and van der Laan 2012; Petersen, Schwab, Gruber, Blaser, Schomaker, and van der Laan 2014; Schnitzer, Moodie, van der Laan, Platt, and Klein 2013). These methods are implemented in the ltmle package for R. The ltmle package provides methods to estimate intervention-specific means and measures of association including the average treatment effect, causal odds ratio and causal risk ratio and parameters of a longitudinal working marginal structural model. The package allows for multiple time point treatments, time-varying covariates and right censoring of the outcome. In this paper we described the usage of the ltmle package and provide examples.

2017-10-12 — A Generally Efficient Targeted Minimum Loss Based Estimator based on the Highly Adaptive Lasso

Authors: M. J. van der Laan
Year: 2017
Publication Date: 2017-10-12
Venue: The International Journal of Biostatistics
DOI: 10.1515/ijb-2015-0097
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso

Abstract:
Abstract unavailable from Semantic Scholar.

2017-10-01 — Super learning for anomaly detection in cellular networks

Authors: P. Casas, J. Vanerio
Year: 2017
Publication Date: 2017-10-01
Venue: IEEE International Conference on Wireless and Mobile Computing, Networking and Communications
DOI: 10.1109/WiMOB.2017.8115784
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Abstract unavailable from Semantic Scholar.

2017-09-26 — Construction of environmental risk score beyond standard linear models using machine learning methods: application to metal mixtures, oxidative stress and cardiovascular disease in NHANES

Authors: S. Park, Zhangchen Zhao, B. Mukherjee
Year: 2017
Publication Date: 2017-09-26
Venue: Environmental Health
DOI: 10.1186/s12940-017-0310-9
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
BackgroundThere is growing concern of health effects of exposure to pollutant mixtures. We initially proposed an Environmental Risk Score (ERS) as a summary measure to examine the risk of exposure to multi-pollutants in epidemiologic research considering only pollutant main effects. We expand the ERS by consideration of pollutant-pollutant interactions using modern machine learning methods. We illustrate the multi-pollutant approaches to predicting a marker of oxidative stress (gamma-glutamyl transferase (GGT)), a common disease pathway linking environmental exposure and numerous health endpoints.MethodsWe examined 20 metal biomarkers measured in urine or whole blood from 6 cycles of the National Health and Nutrition Examination Survey (NHANES 2003–2004 to 2013–2014, n = 9664). We randomly split the data evenly into training and testing sets and constructed ERS’s of metal mixtures for GGT using adaptive elastic-net with main effects and pairwise interactions (AENET-I), Bayesian additive regression tree (BART), Bayesian kernel machine regression (BKMR), and Super Learner in the training set and evaluated their performances in the testing set. We also evaluated the associations between GGT-ERS and cardiovascular endpoints.ResultsERS based on AENET-I performed better than other approaches in terms of prediction errors in the testing set. Important metals identified in relation to GGT include cadmium (urine), dimethylarsonic acid, monomethylarsonic acid, cobalt, and barium. All ERS’s showed significant associations with systolic and diastolic blood pressure and hypertension. For hypertension, one SD increase in each ERS from AENET-I, BART and SuperLearner were associated with odds ratios of 1.26 (95% CI, 1.15, 1.38), 1.17 (1.09, 1.25), and 1.30 (1.20, 1.40), respectively. ERS’s showed non-significant positive associations with mortality outcomes.ConclusionsERS is a useful tool for characterizing cumulative risk from pollutant mixtures, with accounting for statistical challenges such as high degrees of correlations and pollutant-pollutant interactions. ERS constructed for an intermediate marker like GGT is predictive of related disease endpoints.

2017-09-19 — Uniform Consistency of the Highly Adaptive Lasso Estimator of Infinite Dimensional Parameters

Authors: M. Laan, Aurélien F. Bibaut
Year: 2017
Publication Date: 2017-09-19
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso

Abstract:
Consider the case that we observe $n$ independent and identically distributed copies of a random variable with a probability distribution known to be an element of a specified statistical model. We are interested in estimating an infinite dimensional target parameter that minimizes the expectation of a specified loss function. In \cite{generally_efficient_TMLE} we defined an estimator that minimizes the empirical risk over all multivariate real valued cadlag functions with variation norm bounded by some constant $M$ in the parameter space, and selects $M$ with cross-validation. We referred to this estimator as the Highly-Adaptive-Lasso estimator due to the fact that the constrained can be formulated as a bound $M$ on the sum of the coefficients a linear combination of a very large number of basis functions. Specifically, in the case that the target parameter is a conditional mean, then it can be implemented with the standard LASSO regression estimator. In \cite{generally_efficient_TMLE} we proved that the HAL-estimator is consistent w.r.t. the (quadratic) loss-based dissimilarity at a rate faster than $n^{-1/2}$ (i.e., faster than $n^{-1/4}$ w.r.t. a norm), even when the parameter space is completely nonparametric. The only assumption required for this rate is that the true parameter function has a finite variation norm. The loss-based dissimilarity is often equivalent with the square of an $L^2(P_0)$-type norm. In this article, we establish that under some weak continuity condition, the HAL-estimator is also uniformly consistent.

2017-09-14 — eltmle: Ensemble learning targeted maximum likelihood estimation

Authors: Miguel Angel Luque Fernandez
Year: 2017
Publication Date: 2017-09-14
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2017-08-30 — Finite Sample Inference for Targeted Learning

Authors: M. Laan
Year: 2017
Publication Date: 2017-08-30
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
The Highly-Adaptive-Lasso(HAL)-TMLE is an efficient estimator of a pathwise differentiable parameter in a statistical model that at minimal (and possibly only) assumes that the sectional variation norm of the true nuisance parameters are finite. It relies on an initial estimator (HAL-MLE) of the nuisance parameters by minimizing the empirical risk over the parameter space under the constraint that sectional variation norm is bounded by a constant, where this constant can be selected with cross-validation. In the formulation of the HALMLE this sectional variation norm corresponds with the sum of absolute value of coefficients for an indicator basis. Due to its reliance on machine learning, statistical inference for the TMLE has been based on its normal limit distribution, thereby potentially ignoring a large second order remainder in finite samples. In this article, we present four methods for construction of a finite sample 0.95-confidence interval that use the nonparametric bootstrap to estimate the finite sample distribution of the HAL-TMLE or a conservative distribution dominating the true finite sample distribution. We prove that it consistently estimates the optimal normal limit distribution, while its approximation error is driven by the performance of the bootstrap for a well behaved empirical process. We demonstrate our general inferential methods for 1) nonparametric estimation of the average treatment effect based on observing on each unit a covariate vector, binary treatment, and outcome, and for 2) nonparametric estimation of the integral of the square of the multivariate density of the data distribution.

2017-08-18 — Stacked generalization: an introduction to super learning

Authors: A. Naimi, L. Balzer
Year: 2017
Publication Date: 2017-08-18
Venue: bioRxiv
DOI: 10.1007/s10654-018-0390-z
Link: Semantic Scholar
Matched Keywords: super learner, super learning

Abstract:
Stacked generalization is an ensemble method that allows researchers to combine several different prediction algorithms into one. Since its introduction in the early 1990s, the method has evolved several times into a host of methods among which is the “Super Learner”. Super Learner uses V-fold cross-validation to build the optimal weighted combination of predictions from a library of candidate algorithms. Optimality is defined by a user-specified objective function, such as minimizing mean squared error or maximizing the area under the receiver operating characteristic curve. Although relatively simple in nature, use of Super Learner by epidemiologists has been hampered by limitations in understanding conceptual and technical details. We work step-by-step through two examples to illustrate concepts and address common concerns.

2017-07-26 — biotmle: Targeted Learning for Biomarker Discovery

Authors: N. Hejazi, Weixin Cai, A. Hubbard
Year: 2017
Publication Date: 2017-07-26
Venue: Journal of Open Source Software
DOI: 10.21105/JOSS.00295
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation, tmle

Abstract:
The biotmle package provides an implementation of a biomarker discovery methodology based on targeted minimum loss-Based estimation (TMLE) (van der Laan and Rose 2011) and a generalization of the moderated t-statistic of (Smyth 2004), designed for use with biological sequencing data (e.g., microarrays, RNA-seq). The statistical approach made available in this package relies on the use of TMLE to rigorously evaluate the association between a set of potential biomarkers and another variable of interest while adjusting for potential confounding from another set of user-specified covariates. The implementation is in the form of a package for the R language for statistical computing (R Core Team 2017).

2017-07-18 — On adaptive propensity score truncation in causal inference

Authors: Cheng Ju, Joshua Schwab, M. J. van der Laan
Year: 2017
Publication Date: 2017-07-18
Venue: Statistical Methods in Medical Research
DOI: 10.1177/0962280218774817
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
The positivity assumption, or the experimental treatment assignment (ETA) assumption, is important for identifiability in causal inference. Even if the positivity assumption holds, practical violations of this assumption may jeopardize the finite sample performance of the causal estimator. One of the consequences of practical violations of the positivity assumption is extreme values in the estimated propensity score (PS). A common practice to address this issue is truncating the PS estimate when constructing PS-based estimators. In this study, we propose a novel adaptive truncation method, Positivity-C-TMLE, based on the collaborative targeted maximum likelihood estimation (C-TMLE) methodology. We demonstrate the outstanding performance of our novel approach in a variety of simulations by comparing it with other commonly studied estimators. Results show that by adaptively truncating the estimated PS with a more targeted objective function, the Positivity-C-TMLE estimator achieves the best performance for both point estimation and confidence interval coverage among all estimators considered.

2017-07-01 — ELTMLE: Stata module to provide Ensemble Learning Targeted Maximum Likelihood Estimation

Authors: M. Luque-Fernández
Year: 2017
Publication Date: 2017-07-01
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2017-06-24 — Estimating the Comparative Effectiveness of Feeding Interventions in the Pediatric Intensive Care Unit: A Demonstration of Longitudinal Targeted Maximum Likelihood Estimation

Authors: N. Kreif, Linh Tran, R. Grieve, B. D. De Stavola, R. Tasker, M. Petersen
Year: 2017
Publication Date: 2017-06-24
Venue: American Journal of Epidemiology
DOI: 10.1093/aje/kwx213
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Abstract Longitudinal data sources offer new opportunities for the evaluation of sequential interventions. To adjust for time-dependent confounding in these settings, longitudinal targeted maximum likelihood based estimation (TMLE), a doubly robust method that can be coupled with machine learning, has been proposed. This paper provides a tutorial in applying longitudinal TMLE, in contrast to inverse probability of treatment weighting and g-computation based on iterative conditional expectations. We apply these methods to estimate the causal effect of nutritional interventions on clinical outcomes among critically ill children in a United Kingdom study (Control of Hyperglycemia in Paediatric Intensive Care, 2008–2011). We estimate the probability of a child’s being discharged alive from the pediatric intensive care unit by a given day, under a range of static and dynamic feeding regimes. We find that before adjustment, patients who follow the static regime “never feed” are discharged by the end of the fifth day with a probability of 0.88 (95% confidence interval: 0.87, 0.90), while for the patients who follow the regime “feed from day 3,” the probability of discharge is 0.64 (95% confidence interval: 0.62, 0.66). After adjustment for time-dependent confounding, most of this difference disappears, and the statistical methods produce similar results. TMLE offers a flexible estimation approach; hence, we provide practical guidance on implementation to encourage its wider use.

2017-06-23 — Longitudinal Mediation Analysis with Time-varying Mediators and Exposures, with Application to Survival Outcomes

Authors: Wenjing Zheng, M. J. van der Laan
Year: 2017
Publication Date: 2017-06-23
Venue: Journal of Causal Inference
DOI: 10.1515/jci-2016-0006
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract: 1 In this paper, we study the effect of a time-varying exposure mediated by a time-varying intermediate variable. We consider general longitudinal settings, including survival outcomes. At a given time point, the exposure and mediator of interest are influenced by past covariates, mediators and exposures, and affect future covariates, mediators and exposures. Right censoring, if present, occurs in response to past history. To address the challenges in mediation analysis that are unique to these settings, we propose a formulation in terms of random interventions based on conditional distributions for the mediator. This formulation, in particular, allows for well-defined natural direct and indirect effects in the survival setting, and natural decomposition of the standard total effect. Upon establishing identifiability and the corresponding statistical estimands, we derive the efficient influence curves and establish their robustness properties. Applying Targeted Maximum Likelihood Estimation, we use these efficient influence curves to construct multiply robust and efficient estimators. We also present an inverse probability weighted estimator and a nested non-targeted substitution estimator for these parameters.

2017-06-15 — Estimating inverse probability weights using super learner when weight‐model specification is unknown in a marginal structural Cox model context

Authors: M. E. Karim, R. Platt
Year: 2017
Publication Date: 2017-06-15
Venue: Statistics in Medicine
DOI: 10.1002/sim.7266
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2017-06-08 — A New Approach to Hierarchical Data Analysis: Targeted Maximum Likelihood Estimation of Cluster-Based Effects Under Interference

Authors: L. Balzer, Wenjing Zheng, M. Laan, M. Petersen
Year: 2017
Publication Date: 2017-06-08
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2017-06-08 — A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure

Authors: L. Balzer, Wenjing Zheng, M. J. van der Laan, M. Petersen
Year: 2017
Publication Date: 2017-06-08
Venue: Statistical Methods in Medical Research
DOI: 10.1177/0962280218774936
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
We often seek to estimate the impact of an exposure naturally occurring or randomly assigned at the cluster-level. For example, the literature on neighborhood determinants of health continues to grow. Likewise, community randomized trials are applied to learn about real-world implementation, sustainability, and population effects of interventions with proven individual-level efficacy. In these settings, individual-level outcomes are correlated due to shared cluster-level factors, including the exposure, as well as social or biological interactions between individuals. To flexibly and efficiently estimate the effect of a cluster-level exposure, we present two targeted maximum likelihood estimators (TMLEs). The first TMLE is developed under a non-parametric causal model, which allows for arbitrary interactions between individuals within a cluster. These interactions include direct transmission of the outcome (i.e. contagion) and influence of one individual’s covariates on another’s outcome (i.e. covariate interference). The second TMLE is developed under a causal sub-model assuming the cluster-level and individual-specific covariates are sufficient to control for confounding. Simulations compare the alternative estimators and illustrate the potential gains from pairing individual-level risk factors and outcomes during estimation, while avoiding unwarranted assumptions. Our results suggest that estimation under the sub-model can result in bias and misleading inference in an observational setting. Incorporating working assumptions during estimation is more robust than assuming they hold in the underlying causal model. We illustrate our approach with an application to HIV prevention and treatment.

2017-05-27 — Targeted learning with daily EHR data

Authors: Oleg Sofrygin, Zheng Zhu, J. Schmittdiel, A. Adams, R. Grant, M. J. van der Laan, R. Neugebauer
Year: 2017
Publication Date: 2017-05-27
Venue: Statistics in Medicine
DOI: 10.1002/sim.8164
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Electronic health records (EHR) data provide a cost‐ and time‐effective opportunity to conduct cohort studies of the effects of multiple time‐point interventions in the diverse patient population found in real‐world clinical settings. Because the computational cost of analyzing EHR data at daily (or more granular) scale can be quite high, a pragmatic approach has been to partition the follow‐up into coarser intervals of pre‐specified length (eg, quarterly or monthly intervals). The feasibility and practical impact of analyzing EHR data at a granular scale has not been previously evaluated. We start filling these gaps by leveraging large‐scale EHR data from a diabetes study to develop a scalable targeted learning approach that allows analyses with small intervals. We then study the practical effects of selecting different coarsening intervals on inferences by reanalyzing data from the same large‐scale pool of patients. Specifically, we map daily EHR data into four analytic datasets using 90‐, 30‐, 15‐, and 5‐day intervals. We apply a semiparametric and doubly robust estimation approach, the longitudinal Targeted Minimum Loss‐Based Estimation (TMLE), to estimate the causal effects of four dynamic treatment rules with each dataset, and compare the resulting inferences. To overcome the computational challenges presented by the size of these data, we propose a novel TMLE implementation, the “long‐format TMLE,” and rely on the latest advances in scalable data‐adaptive machine‐learning software, xgboost and h2o, for estimation of the TMLE nuisance parameters.

2017-05-06 — Sequential Double Robustness in Right-Censored Longitudinal Models

Authors: Alexander Luedtke, Oleg Sofrygin, M. Laan, M. Carone
Year: 2017
Publication Date: 2017-05-06
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Consider estimating the G-formula for the counterfactual mean outcome under a given treatment regime in a longitudinal study. Bang and Robins provided an estimator for this quantity that relies on a sequential regression formulation of this parameter. This approach is doubly robust in that it is consistent if either the outcome regressions or the treatment mechanisms are consistently estimated. We define a stronger notion of double robustness, termed sequential double robustness, for estimators of the longitudinal G-formula. The definition emerges naturally from a more general definition of sequential double robustness for the outcome regression estimators. An outcome regression estimator is sequentially doubly robust (SDR) if, at each subsequent time point, either the outcome regression or the treatment mechanism is consistently estimated. This form of robustness is exactly what one would anticipate is attainable by studying the remainder term of a first-order expansion of the G-formula parameter. We show that a particular implementation of an existing procedure is SDR. We also introduce a novel SDR estimator, whose development involves a novel translation of ideas used in targeted minimum loss-based estimation to the infinite-dimensional setting.

2017-04-10 — Use of a machine learning framework to predict substance use disorder treatment success

Authors: L. Ación, D. Kelmansky, M. J. van der Laan, Ethan Sahker, DeShauna Jones, Stephan Arndt
Year: 2017
Publication Date: 2017-04-10
Venue: PLoS ONE
DOI: 10.1371/journal.pone.0175383
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
There are several methods for building prediction models. The wealth of currently available modeling techniques usually forces the researcher to judge, a priori, what will likely be the best method. Super learning (SL) is a methodology that facilitates this decision by combining all identified prediction algorithms pertinent for a particular prediction problem. SL generates a final model that is at least as good as any of the other models considered for predicting the outcome. The overarching aim of this work is to introduce SL to analysts and practitioners. This work compares the performance of logistic regression, penalized regression, random forests, deep learning neural networks, and SL to predict successful substance use disorders (SUD) treatment. A nationwide database including 99,013 SUD treatment patients was used. All algorithms were evaluated using the area under the receiver operating characteristic curve (AUC) in a test sample that was not included in the training sample used to fit the prediction models. AUC for the models ranged between 0.793 and 0.820. SL was superior to all but one of the algorithms compared. An explanation of SL steps is provided. SL is the first step in targeted learning, an analytic framework that yields double robust effect estimation and inference with fewer assumptions than the usual parametric methods. Different aspects of SL depending on the context, its function within the targeted learning framework, and the benefits of this methodology in the addiction field are discussed.

2017-04-05 — The relative performance of ensemble methods with deep convolutional neural networks for image classification

Authors: Cheng Ju, Aurélien F. Bibaut, M. Laan
Year: 2017
Publication Date: 2017-04-05
Venue: Journal of Applied Statistics
DOI: 10.1080/02664763.2018.1441383
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Artificial neural networks have been successfully applied to a variety of machine learning tasks, including image recognition, semantic segmentation, and machine translation. However, few studies fully investigated ensembles of artificial neural networks. In this work, we investigated multiple widely used ensemble methods, including unweighted averaging, majority voting, the Bayes Optimal Classifier, and the (discrete) Super Learner, for image recognition tasks, with deep neural networks as candidate algorithms. We designed several experiments, with the candidate algorithms being the same network structure with different model checkpoints within a single training process, networks with same structure but trained multiple times stochastically, and networks with different structure. In addition, we further studied the overconfidence phenomenon of the neural networks, as well as its impact on the ensemble methods. Across all of our experiments, the Super Learner achieved best performance among all the ensemble methods in this study.

2017-03-29 — Super machine learning: improving accuracy and reducing variance of behaviour classification from accelerometry

Authors: M. Ladds, Adam Thompson, Julianna Kadar, David J Slip, David P Hocking, Robert G Harcourt
Year: 2017
Publication Date: 2017-03-29
Venue: Animal Biotelemetry
DOI: 10.1186/s40317-017-0123-1
Link: Semantic Scholar
Matched Keywords: super learner, super learning

Abstract:
Semi-automating the analyses of accelerometry data makes it possible to synthesize large data sets. However, when constructing activity budgets from accelerometry data, there are many methods to extract, analyse and report data and results. For instance, machine learning is a robust approach to classifying data. We used a new method, super learning, that combines base learners (different machine learning methods) in an optimal manner to achieve overall improved accuracy. Other facets of super learning include the number of behavioural categories to predict, the number of epochs (sample window size) used to split data for training and testing and the parameters on which to train the models. The super learner accurately classified behaviour categories with higher accuracy and lower variance than comparative models. For all models tested, using four behaviours, in comparison with six, achieved higher rates of accuracy. The number of epochs chosen also affected the accuracy with smaller epochs (7 and 13) performing better than longer epochs (25 and 75). Correct model selection, training and testing are imperative to creating reliable and valid classification models. To do so means model fitting must use a wide array of selection criteria. We evaluated a number of these including model, number of behaviours to classify and epoch length and then used a parameter grid search to implement the models. We found that all criteria tested contributed to the models’ overall accuracies. Fewer behaviour categories and shorter epoch length improved the performance of all models tested. The super learner classified behaviours with higher accuracy and lower variance than other models tested. However, when using this model, users need to consider the additional human and computational time required for implementation. Machine learning is a powerful method for classifying the behaviour of animals from accelerometers. Care and consideration of the modelling parameters evaluated in this study are essential when using this type of statistical analysis.

2017-03-22 — Computational health economics for identification of unprofitable health care enrollees

Authors: Sherri Rose, Savannah L. Bergquist, T. Layton
Year: 2017
Publication Date: 2017-03-22
Venue: Biostatistics
DOI: 10.1093/biostatistics/kxx012
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Summary Health insurers may attempt to design their health plans to attract profitable enrollees while deterring unprofitable ones. Such insurers would not be delivering socially efficient levels of care by providing health plans that maximize societal benefit, but rather intentionally distorting plan benefits to avoid high‐cost enrollees, potentially to the detriment of health and efficiency. In this work, we focus on a specific component of health plan design at risk for health insurer distortion in the Health Insurance Marketplaces: the prescription drug formulary. We introduce an ensembled machine learning function to determine whether drug utilization variables are predictive of a new measure of enrollee unprofitability we derive, and thus vulnerable to distortions by insurers. Our implementation also contains a unique application‐specific variable selection tool. This study demonstrates that super learning is effective in extracting the relevant signal for this prediction problem, and that a small number of drug variables can be used to identify unprofitable enrollees. The results are both encouraging and concerning. While risk adjustment appears to have been reasonably successful at weakening the relationship between therapeutic‐class‐specific drug utilization and unprofitability, some classes remain predictive of insurer losses. The vulnerable enrollees whose prescription drug regimens include drugs in these classes may need special protection from regulators in health insurance market design.

2017-03-09 — eltmle: Ensemble Learning Targeted Maximum Likelihood Estimation

Authors: M. Luque
Year: 2017
Publication Date: 2017-03-09
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2017-03-07 — Scalable collaborative targeted learning for high-dimensional data

Authors: Cheng Ju, Susan Gruber, S. Lendle, A. Chambaz, J. Franklin, R. Wyss, S. Schneeweiss, M. J. van der Laan
Year: 2017
Publication Date: 2017-03-07
Venue: Statistical Methods in Medical Research
DOI: 10.1177/0962280217729845
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation, tmle

Abstract:
Robust inference of a low-dimensional parameter in a large semi-parametric model relies on external estimators of infinite-dimensional features of the distribution of the data. Typically, only one of the latter is optimized for the sake of constructing a well-behaved estimator of the low-dimensional parameter of interest. Optimizing more than one of them for the sake of achieving a better bias-variance trade-off in the estimation of the parameter of interest is the core idea driving the general template of the collaborative targeted minimum loss-based estimation procedure. The original instantiation of the collaborative targeted minimum loss-based estimation template can be presented as a greedy forward stepwise collaborative targeted minimum loss-based estimation algorithm. It does not scale well when the number p of covariates increases drastically. This motivates the introduction of a novel instantiation of the collaborative targeted minimum loss-based estimation template where the covariates are pre-ordered. Its time complexity is O ( p ) as opposed to the original O ( p 2 ) , a remarkable gain. We propose two pre-ordering strategies and suggest a rule of thumb to develop other meaningful strategies. Because it is usually unclear a priori which pre-ordering strategy to choose, we also introduce another instantiation called SL-C-TMLE algorithm that enables the data-driven choice of the better pre-ordering strategy given the problem at hand. Its time complexity is O ( p ) as well. The computational burden and relative performance of these algorithms were compared in simulation studies involving fully synthetic data or partially synthetic data based on a real world large electronic health database; and in analyses of three real, large electronic health databases. In all analyses involving electronic health databases, the greedy collaborative targeted minimum loss-based estimation algorithm is unacceptably slow. Simulation studies seem to indicate that our scalable collaborative targeted minimum loss-based estimation and SL-C-TMLE algorithms work well. All C-TMLEs are publicly available in a Julia software package.

2017-03-07 — Propensity score prediction for electronic healthcare databases using super learner and high-dimensional propensity score methods

Authors: Cheng Ju, Mary A Combs, S. Lendle, J. Franklin, R. Wyss, S. Schneeweiss, M. J. van der Laan
Year: 2017
Publication Date: 2017-03-07
Venue: Journal of Applied Statistics
DOI: 10.1080/02664763.2019.1582614
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
ABSTRACT The optimal learner for prediction modeling varies depending on the underlying data-generating distribution. Super Learner (SL) is a generic ensemble learning algorithm that uses cross-validation to select among a ‘library’ of candidate prediction models. While SL has been widely studied in a number of settings, it has not been thoroughly evaluated in large electronic healthcare databases that are common in pharmacoepidemiology and comparative effectiveness research. In this study, we applied and evaluated the performance of SL in its ability to predict the propensity score (PS), the conditional probability of treatment assignment given baseline covariates, using three electronic healthcare databases. We considered a library of algorithms that consisted of both nonparametric and parametric models. We also proposed a novel strategy for prediction modeling that combines SL with the high-dimensional propensity score (hdPS) variable selection algorithm. Predictive performance was assessed using three metrics: the negative log-likelihood, area under the curve (AUC), and time complexity. Results showed that the best individual algorithm, in terms of predictive performance, varied across datasets. The SL was able to adapt to the given dataset and optimize predictive performance relative to any individual learner. Combining the SL with the hdPS was the most consistent prediction method and may be promising for PS estimation and prediction modeling in electronic healthcare databases.

2017-01-26 — Using a network-based approach and targeted maximum likelihood estimation to evaluate the effect of adding pre-exposure prophylaxis to an ongoing test-and-treat trial

Authors: L. Balzer, Patrick C. Staples, J. Onnela, V. DeGruttola
Year: 2017
Publication Date: 2017-01-26
Venue: Clinical Trials
DOI: 10.1177/1740774516679666
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2017 — Targeted Maximum Likelihood Estimation for Evaluation of the Health Impacts of Air Pollution

Authors: V. Sarovar
Year: 2017
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2017 — Targeted Maximum Likelihood Estimation for Causal Inference in Observational Studies

Authors: Megan S. Schuler, Sherri Rose
Year: 2017
Venue: American Journal of Epidemiology
DOI: 10.1093/aje/kww165
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2016 (18 papers)
2016-11-29 — Semi-Parametric Estimation and Inference for the Mean Outcome of the Single Time-Point Intervention in a Causally Connected Population

Authors: Oleg Sofrygin, M. J. van der Laan
Year: 2016
Publication Date: 2016-11-29
Venue: Journal of Causal Inference
DOI: 10.1515/jci-2016-0003
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Abstract We study the framework for semi-parametric estimation and statistical inference for the sample average treatment-specific mean effects in observational settings where data are collected on a single network of possibly dependent units (e.g., in the presence of interference or spillover). Despite recent advances, many of the current statistical methods rely on estimation techniques that assume a particular parametric model for the outcome, even though some of the important statistical assumptions required by these methods are often violated in observational network settings. In this work we rely on recent methodological advances in the field of targeted maximum likelihood estimation (TMLE) and describe an estimation approach that permits for more realistic classes of data-generative models while providing valid inference in the context of observational network-dependent data. We start by assuming that the true data-generating distribution belongs to a large class of semi-parametric statistical models. We then impose some restrictions on the possible set of such distributions. For example, we assume that the dependence among the observed outcomes can be fully described by an observed network. We then show that under our modeling assumptions, our estimand can be described as a functional of the mixture of the observed data-generating distribution. With this key insight in mind, we describe the TMLE for possibly-dependent units as an iid data algorithm and we demonstrate the validity of our approach with a simulation study. Finally, we extend prior work towards estimation of novel causal parameters such as the unit-specific indirect and direct treatment effects under interference and the effects of interventions that modify the structure of the network.

2016-11-01 — Effect Estimation in Point-Exposure Studies with Binary Outcomes and High-Dimensional Covariate Data – A Comparison of Targeted Maximum Likelihood Estimation and Inverse Probability of Treatment Weighting

Authors: Menglan Pang, T. Schuster, K. Filion, M. Schnitzer, M. Eberg, R. Platt
Year: 2016
Publication Date: 2016-11-01
Venue: The International Journal of Biostatistics
DOI: 10.1515/ijb-2015-0034
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2016-08-17 — Application of targeted maximum likelihood estimation technique to assess the impact of prenatal exposure to nitrogen dioxide and ozone on stillbirth in a California cohort study

Authors: V. Sarovar, R. Basu, B. Malig, M. Laan, M. Petersen
Year: 2016
Publication Date: 2016-08-17
DOI: 10.1289/isee.2016.3455
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2016-06-29 — Practical targeted learning from large data sets by survey sampling

Authors: P. Bertail, A. Chambaz, Émilien Joly
Year: 2016
Publication Date: 2016-06-29
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
We address the practical construction of asymptotic confidence intervals for smooth (i.e., path-wise differentiable), real-valued statistical parameters by targeted learning from independent and identically distributed data in contexts where sample size is so large that it poses computational challenges. We observe some summary measure of all data and select a sub-sample from the complete data set by Poisson rejective sampling with unequal inclusion probabilities based on the summary measures. Targeted learning is carried out from the easier to handle sub-sample. We derive a central limit theorem for the targeted minimum loss estimator (TMLE) which enables the construction of the confidence intervals. The inclusion probabilities can be optimized to reduce the asymptotic variance of the TMLE. We illustrate the procedure with two examples where the parameters of interest are variable importance measures of an exposure (binary or continuous) on an outcome. We also conduct a simulation study and comment on its results. keywords: semiparametric inference; survey sampling; targeted minimum loss estimation (TMLE)

2016-06-16 — Genetic Risk and Longitudinal Disease Activity in Systemic Lupus Erythematosus Using Targeted Maximum Likelihood Estimation

Authors: M. Gianfrancesco, L. Balzer, K. Taylor, L. Trupin, J. Nititham, M. Seldin, A. Singer, L. Criswell, L. Barcellos
Year: 2016
Publication Date: 2016-06-16
Venue: Genes and Immunity
DOI: 10.1038/gene.2016.33
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Systemic lupus erythematous (SLE) is a chronic autoimmune disease associated with genetic and environmental risk factors. However, the extent to which genetic risk is causally associated with disease activity is unknown. We utilized longitudinal-targeted maximum likelihood estimation to estimate the causal association between a genetic risk score (GRS) comprising 41 established SLE variants and clinically important disease activity as measured by the validated Systemic Lupus Activity Questionnaire (SLAQ) in a multiethnic cohort of 942 individuals with SLE. We did not find evidence of a clinically important SLAQ score difference (>4.0) for individuals with a high GRS compared with those with a low GRS across nine time points after controlling for sex, ancestry, renal status, dialysis, disease duration, treatment, depression, smoking and education, as well as time-dependent confounding of missing visits. Individual single-nucleotide polymorphism (SNP) analyses revealed that 12 of the 41 variants were significantly associated with clinically relevant changes in SLAQ scores across time points eight and nine after controlling for multiple testing. Results based on sophisticated causal modeling of longitudinal data in a large patient cohort suggest that individual SLE risk variants may influence disease activity over time. Our findings also emphasize a role for other biological or environmental factors.

2016-05-01 — Variable Selection for Confounder Control, Flexible Modeling and Collaborative Targeted Minimum Loss-Based Estimation in Causal Inference

Authors: M. Schnitzer, J. Lok, Susan Gruber
Year: 2016
Publication Date: 2016-05-01
Venue: The International Journal of Biostatistics
DOI: 10.1515/ijb-2015-0017
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2016-05-01 — One-Step Targeted Minimum Loss-based Estimation Based on Universal Least Favorable One-Dimensional Submodels

Authors: M. J. van der Laan, Susan Gruber
Year: 2016
Publication Date: 2016-05-01
Venue: The International Journal of Biostatistics
DOI: 10.1515/ijb-2015-0054
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2016-04-04 — Targeted Maximum Likelihood Estimation for Pharmacoepidemiologic Research

Authors: Menglan Pang, T. Schuster, K. Filion, M. Eberg, R. Platt
Year: 2016
Publication Date: 2016-04-04
Venue: Epidemiology
DOI: 10.1097/EDE.0000000000000487
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Background: Targeted maximum likelihood estimation has been proposed for estimating marginal causal effects, and is robust to misspecification of either the treatment or outcome model. However, due perhaps to its novelty, targeted maximum likelihood estimation has not been widely used in pharmacoepidemiology. The objective of this study was to demonstrate targeted maximum likelihood estimation in a pharmacoepidemiological study with a high-dimensional covariate space, to incorporate the use of high-dimensional propensity scores into this method, and to compare the results to those of inverse probability weighting. Methods: We implemented the targeted maximum likelihood estimation procedure in a single-point exposure study of the use of statins and the 1-year risk of all-cause mortality postmyocardial infarction using data from the UK Clinical Practice Research Datalink. A range of known potential confounders were considered, and empirical covariates were selected using the high-dimensional propensity scores algorithm. We estimated odds ratios using targeted maximum likelihood estimation and inverse probability weighting with a variety of covariate selection strategies. Results: Through a real example, we demonstrated the double robustness of targeted maximum likelihood estimation. We showed that results with this method and inverse probability weighting differed when a large number of covariates were included in the treatment model. Conclusions: Targeted maximum likelihood can be used in high-dimensional covariate settings. In high-dimensional covariate settings, differences in results between targeted maximum likelihood and inverse probability weighted estimation are likely due to sensitivity to (near) positivity violations. Further investigations are needed to gain better understanding of the advantages and limitations of this method in pharmacoepidemiological studies.

2016-02-18 — Scalable Super Learning

Authors: Unknown
Year: 2016
Publication Date: 2016-02-18
Venue: Handbook of Big Data
DOI: 10.1201/b19567-26
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Abstract unavailable from Semantic Scholar.

2016 — Year Paper tmle : An R Package for Targeted Maximum Likelihood Estimation

Authors: Susan Gruber, M. J. Laan
Year: 2016
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Abstract unavailable from Semantic Scholar.

2016 — Year Paper Super Learner In Prediction

Authors: E. Polley, M. Laan
Year: 2016
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2016 — TMLE for Marginal Structural Models Based on an Instrument

Authors: Boriska Toth, M. J. Laan
Year: 2016
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract unavailable from Semantic Scholar.

2016 — The Highly Adaptive Lasso Estimator

Authors: David C. Benkeser, M. J. Laan
Year: 2016
Venue: International Conference on Data Science and Advanced Analytics
DOI: 10.1109/DSAA.2016.93
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso

Abstract:
Abstract unavailable from Semantic Scholar.

2016 — Targeted Maximum Likelihood Estimation of Natural Direct Effect

Authors: Wenjing Zheng, M. J. Laan
Year: 2016
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2016 — Mortality Prediction in the ICU Based on MIMIC-II Results from the Super ICU Learner Algorithm (SICULA) Project

Authors: R. Pirracchio
Year: 2016
DOI: 10.1007/978-3-319-43742-2_20
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
MIMIC II dataset offers a unique opportunity to develop and validate new severity scores. Non-parametric approaches are needed to model ICU mortality. Prediction of hospital mortality based on the Super Learner achieves significantly improved performance, both in terms of calibration and discrimination, as compared to conventional severity scores.

2016 — Causal Conference Program 2016 Causal Conference Titles and Abstracts Title: Efficient Inference of Average Treatment Effects in High Dimensions via Approximate Residual Balancing Title: One-step Targeted Mle and the Highly Adaptive Lasso

Authors: M. Laan, Stefan Wager, Sherri RoseMark, vander Laan, M. Laan, Xiaoru Wu, M. Hudgens, David Choi, Fan Li, P. Rosenbaum, Joshua Angrist, Dylan S. Small, Jamie Robins, G. Chan, D. Rubin
Year: 2016
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso

Abstract:
Abstract unavailable from Semantic Scholar.

2016 — Applications of Targeted Maximum Likelihood Estimation

Authors: L. Balzer
Year: 2016
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2016 — 3-3-2011 SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION

Authors: Michael Rosenblum
Year: 2016
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2015 (11 papers)
2015-11-26 — Second-Order Inference for the Mean of a Variable Missing at Random

Authors: I. Díaz, M. Carone, M. J. van der Laan
Year: 2015
Publication Date: 2015-11-26
Venue: The International Journal of Biostatistics
DOI: 10.1515/ijb-2015-0031
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation, tmle

Abstract:
Abstract We present a second-order estimator of the mean of a variable subject to missingness, under the missing at random assumption. The estimator improves upon existing methods by using an approximate second-order expansion of the parameter functional, in addition to the first-order expansion employed by standard doubly robust methods. This results in weaker assumptions about the convergence rates necessary to establish consistency, local efficiency, and asymptotic linearity. The general estimation strategy is developed under the targeted minimum loss-based estimation (TMLE) framework. We present a simulation comparing the sensitivity of the first and second-order estimators to the convergence rate of the initial estimators of the outcome regression and missingness score. In our simulation, the second-order TMLE always had a coverage probability equal or closer to the nominal value 0.95, compared to its first-order counterpart. In the best-case scenario, the proposed second-order TMLE had a coverage probability of 0.86 when the first-order TMLE had a coverage probability of zero. We also present a novel first-order estimator inspired by a second-order expansion of the parameter functional. This estimator only requires one-dimensional smoothing, whereas implementation of the second-order TMLE generally requires kernel smoothing on the covariate space. The first-order estimator proposed is expected to have improved finite sample performance compared to existing first-order estimators. In the best-case scenario of our simulation study, the novel first-order TMLE improved the coverage probability from 0 to 0.90. We provide an illustration of our methods using a publicly available dataset to determine the effect of an anticoagulant on health outcomes of patients undergoing percutaneous coronary intervention. We provide R code implementing the proposed estimator.

2015-09-30 — Occupational Exposure to PM2.5 and Incidence of Ischemic Heart Disease

Authors: Daniel Brown, M. Petersen, S. Costello, E. Noth, K. Hammond, M. Cullen, M. J. van der Laan, E. Eisen
Year: 2015
Publication Date: 2015-09-30
Venue: Epidemiology
DOI: 10.1097/EDE.0000000000000329
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Background: We investigated the incidence of ischemic heart disease (IHD) in relation to accumulated exposure to particulate matter (PM) in a cohort of aluminum workers. We adjusted for time varying confounding characteristic of the healthy worker survivor effect, using a recently introduced method for the estimation of causal target parameters. Methods: Applying longitudinal targeted minimum loss-based estimation, we estimated the difference in marginal cumulative risk of IHD in the cohort comparing counterfactual outcomes if always exposed above to always exposed below a PM2.5 exposure cut-off. Analyses were stratified by sub-cohort employed in either smelters or fabrication facilities. We selected two exposure cut-offs a priori, at the median and 10th percentile in each sub-cohort. Results: In smelters, the estimated IHD risk difference after 15 years of accumulating PM2.5 exposure during follow-up was 2.9% (0.6%, 5.1%) using the 10th percentile cut-off of 0.10 mg/m3. For fabrication workers, the difference was 2.5% (0.8%, 4.1%) at the 10th percentile of 0.06 mg/m3. Using the median exposure cut-off, results were similar in direction but smaller in size. We present marginal incidence curves describing the cumulative risk of IHD over the course of follow-up for each sub-cohort under each intervention regimen. Conclusions: The accumulation of exposure to PM2.5 appears to result in higher risks of IHD in both aluminum smelter and fabrication workers. This represents the first longitudinal application of targeted minimum loss-based estimation, a method for generating doubly robust semi-parametric efficient substitution estimators of causal parameters, in the fields of occupational and environmental epidemiology.

2015-09-28 — Targeted Maximum Likelihood Estimation for Network Data

Authors: Oleg Sofrygin, M. Laan
Year: 2015
Publication Date: 2015-09-28
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2015-09-01 — Balancing Score Adjusted Targeted Minimum Loss-based Estimation

Authors: S. Lendle, B. Fireman, M. J. van der Laan
Year: 2015
Publication Date: 2015-09-01
Venue: Journal of Causal Inference
DOI: 10.1515/jci-2012-0012
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2015-06-08 — Evaluation of the Effect of a Continuous Treatment: A Machine Learning Approach with an Application to Treatment for Traumatic Brain Injury

Authors: N. Kreif, R. Grieve, I. Díaz, D. Harrison
Year: 2015
Publication Date: 2015-06-08
Venue: Health Economics
DOI: 10.1002/hec.3189
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Summary For a continuous treatment, the generalised propensity score (GPS) is defined as the conditional density of the treatment, given covariates. GPS adjustment may be implemented by including it as a covariate in an outcome regression. Here, the unbiased estimation of the dose–response function assumes correct specification of both the GPS and the outcome‐treatment relationship. This paper introduces a machine learning method, the ‘Super Learner’, to address model selection in this context. In the two‐stage estimation approach proposed, the Super Learner selects a GPS and then a dose–response function conditional on the GPS, as the convex combination of candidate prediction algorithms. We compare this approach with parametric implementations of the GPS and to regression methods. We contrast the methods in the Risk Adjustment in Neurocritical care cohort study, in which we estimate the marginal effects of increasing transfer time from emergency departments to specialised neuroscience centres, for patients with acute traumatic brain injury. With parametric models for the outcome, we find that dose–response curves differ according to choice of specification. With the Super Learner approach to both regression and the GPS, we find that transfer time does not have a statistically significant marginal effect on the outcomes. © 2015 The Authors. Health Economics Published by John Wiley & Sons Ltd.

2015-05-29 — Longitudinal Targeted Maximum Likelihood Estimation

Authors: Joshua Schwab, S. Lendle, M. Petersen, M. Laan
Year: 2015
Publication Date: 2015-05-29
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2015-05-01 — Super Learner Analysis of Electronic Adherence Data Improves Viral Prediction and May Provide Strategies for Selective HIV RNA Monitoring

Authors: M. Petersen, E. LeDell, Joshua Schwab, V. Sarovar, R. Gross, N. Reynolds, J. Haberer, K. Goggin, C. Golin, J. Arnsten, M. Rosen, R. Remien, David Etoori, I. Wilson, J. Simoni, J. Erlen, M. J. van der Laan, Honghu Liu, D. Bangsberg
Year: 2015
Publication Date: 2015-05-01
Venue: Journal of Acquired Immune Deficiency Syndromes
DOI: 10.1097/QAI.0000000000000548
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2015 — Using the tmle.npvi R package

Authors: A. Chambaz, P. Neuvial
Year: 2015
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract unavailable from Semantic Scholar.

2015 — Targeted Minimum Loss Based Estimation: Applications and Extensions in Causal Inference and Big Data

Authors: S. Lendle
Year: 2015
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2015 — Sample R code for rare outcomes TMLE

Authors: L. Balzer
Year: 2015
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract unavailable from Semantic Scholar.

2015 — Practice of Epidemiology Improving Propensity Score Estimators ’ Robustness to Model Misspecification Using Super Learner

Authors: R. Pirracchio, M. Petersen, M. V. D. Laan, Hôpital Service d ’ Anesthésie-Réanimation, Européen Georges, Pompidou
Year: 2015
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2014 (10 papers)
2014-11-10 — Super Learning Day at Harrogate Grammar School

Authors: T. Cook
Year: 2014
Publication Date: 2014-11-10
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Abstract unavailable from Semantic Scholar.

2014-09-01 — Nouveautés en modélisation non paramétrique - Apports du Super Learner

Authors: R. Pirracchio
Year: 2014
Publication Date: 2014-09-01
DOI: 10.1016/J.RESPE.2014.06.004
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2014-07-31 — EFFECT OF BREASTFEEDING ON GASTROINTESTINAL INFECTION IN INFANTS: A TARGETED MAXIMUM LIKELIHOOD APPROACH FOR CLUSTERED LONGITUDINAL DATA.

Authors: M. Schnitzer, M. J. van der Laan, E. Moodie, R. Platt
Year: 2014
Publication Date: 2014-07-31
Venue: Annals of Applied Statistics
DOI: 10.1214/14-AOAS727
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
The PROmotion of Breastfeeding Intervention Trial (PROBIT) cluster-randomized a program encouraging breastfeeding to new mothers in hospital centers. The original studies indicated that this intervention successfully increased duration of breastfeeding and lowered rates of gastrointestinal tract infections in newborns. Additional scientific and popular interest lies in determining the causal effect of longer breastfeeding on gastrointestinal infection. In this study, we estimate the expected infection count under various lengths of breastfeeding in order to estimate the effect of breastfeeding duration on infection. Due to the presence of baseline and time-dependent confounding, specialized "causal" estimation methods are required. We demonstrate the double-robust method of Targeted Maximum Likelihood Estimation (TMLE) in the context of this application and review some related methods and the adjustments required to account for clustering. We compare TMLE (implemented both parametrically and using a data-adaptive algorithm) to other causal methods for this example. In addition, we conduct a simulation study to determine (1) the effectiveness of controlling for clustering indicators when cluster-specific confounders are unmeasured and (2) the importance of using data-adaptive TMLE.

2014-06-30 — Cervical Cancer Precursors and Hormonal Contraceptive Use in HIV-Positive Women: Application of a Causal Model and Semi-Parametric Estimation Methods

Authors: H. Leslie, D. Karasek, L. F. Harris, Emily G Chang, N. Abdulrahim, May Maloba, M. Huchko
Year: 2014
Publication Date: 2014-06-30
Venue: PLoS ONE
DOI: 10.1371/journal.pone.0101090
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Objective To demonstrate the application of causal inference methods to observational data in the obstetrics and gynecology field, particularly causal modeling and semi-parametric estimation. Background Human immunodeficiency virus (HIV)-positive women are at increased risk for cervical cancer and its treatable precursors. Determining whether potential risk factors such as hormonal contraception are true causes is critical for informing public health strategies as longevity increases among HIV-positive women in developing countries. Methods We developed a causal model of the factors related to combined oral contraceptive (COC) use and cervical intraepithelial neoplasia 2 or greater (CIN2+) and modified the model to fit the observed data, drawn from women in a cervical cancer screening program at HIV clinics in Kenya. Assumptions required for substantiation of a causal relationship were assessed. We estimated the population-level association using semi-parametric methods: g-computation, inverse probability of treatment weighting, and targeted maximum likelihood estimation. Results We identified 2 plausible causal paths from COC use to CIN2+: via HPV infection and via increased disease progression. Study data enabled estimation of the latter only with strong assumptions of no unmeasured confounding. Of 2,519 women under 50 screened per protocol, 219 (8.7%) were diagnosed with CIN2+. Marginal modeling suggested a 2.9% (95% confidence interval 0.1%, 6.9%) increase in prevalence of CIN2+ if all women under 50 were exposed to COC; the significance of this association was sensitive to method of estimation and exposure misclassification. Conclusion Use of causal modeling enabled clear representation of the causal relationship of interest and the assumptions required to estimate that relationship from the observed data. Semi-parametric estimation methods provided flexibility and reduced reliance on correct model form. Although selected results suggest an increased prevalence of CIN2+ associated with COC, evidence is insufficient to conclude causality. Priority areas for future studies to better satisfy causal criteria are identified.

2014-06-18 — Targeted Maximum Likelihood Estimation for Dynamic and Static Longitudinal Marginal Structural Working Models

Authors: M. Petersen, Joshua Schwab, Susan Gruber, N. Blaser, M. Schomaker, M. J. van der Laan
Year: 2014
Publication Date: 2014-06-18
Venue: Journal of Causal Inference
DOI: 10.1515/jci-2013-0007
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2014-06-02 — Targeted Maximum Likelihood Estimation using Exponential Families

Authors: I. Díaz, Michael Rosenblum
Year: 2014
Publication Date: 2014-06-02
Venue: The International Journal of Biostatistics
DOI: 10.1515/ijb-2014-0039
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Abstract Targeted maximum likelihood estimation (TMLE) is a general method for estimating parameters in semiparametric and nonparametric models. The key step in any TMLE implementation is constructing a sequence of least-favorable parametric models for the parameter of interest. This has been done for a variety of parameters arising in causal inference problems, by augmenting standard regression models with a “clever-covariate.” That approach requires deriving such a covariate for each new type of problem; for some problems such a covariate does not exist. To address these issues, we give a general TMLE implementation based on exponential families. This approach does not require deriving a clever-covariate, and it can be used to implement TMLE for estimating any smooth parameter in the nonparametric model. A computational advantage is that each iteration of TMLE involves estimation of a parameter in an exponential family, which is a convex optimization problem for which software implementing reliable and computationally efficient methods exists. We illustrate the method in three estimation problems, involving the mean of an outcome missing at random, the parameter of a median regression model, and the causal effect of a continuous exposure, respectively. We conduct a simulation study comparing different choices for the parametric submodel. We find that the choice of submodel can have an important impact on the behavior of the estimator in finite samples.

2014-04-22 — R Package multiPIM: A Causal Inference Approach to Variable Importance Analysis

Authors: Stephan Ritter, N. Jewell, A. Hubbard
Year: 2014
Publication Date: 2014-04-22
DOI: 10.18637/JSS.V057.I08
Link: Semantic Scholar
Matched Keywords: super learner, tmle

Abstract:
We describe the R package multiPIM, including statistical background, functionality and user options. The package is for variable importance analysis, and is meant primarily for analyzing data from exploratory epidemiological studies, though it could certainly be applied in other areas as well. The approach taken to variable importance comes from the causal inference field, and is different from approaches taken in other R packages. By default, multiPIM uses a double robust targeted maximum likelihood estimator (TMLE) of a parameter akin to the attributable risk. Several regression methods/machine learning algorithms are available for estimating the nuisance parameters of the models, including super learner, a meta-learner which combines several different algorithms into one. We describe a simulation in which the double robust TMLE is compared to the graphical computation estimator. We also provide example analyses using two data sets which are included with the package.

2014-03-26 — Targeted learning: From MLE to TMLE

Authors: M. Laan
Year: 2014
Publication Date: 2014-03-26
DOI: 10.1201/B16720-45
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract unavailable from Semantic Scholar.

2014-03-01 — Modeling the impact of hepatitis C viral clearance on end‐stage liver disease in an HIV co‐infected cohort with targeted maximum likelihood estimation

Authors: M. Schnitzer, E. Moodie, M. J. van der Laan, R. Platt, M. Klein
Year: 2014
Publication Date: 2014-03-01
Venue: Biometrics
DOI: 10.1111/biom.12105
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2014 — Higher-order Targeted Minimum Loss-based Estimation

Authors: M. Carone, I. Díaz, M. J. Laan
Year: 2014
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2013 (11 papers)
2013-11-29 — Targeted Minimum Loss-Based Estimation of Causal Effects in Right-Censored Survival Data with Time-Dependent Covariates: Warfarin, Stroke, and Death in Atrial Fibrillation

Authors: J. Brooks, M. J. van der Laan, D. Singer, A. Go
Year: 2013
Publication Date: 2013-11-29
DOI: 10.1515/jci-2013-0001
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2013-10-04 — Incidence, risk factors and prediction of post-operative acute kidney injury following cardiac surgery for active infective endocarditis: an observational study

Authors: M. Legrand, R. Pirracchio, Anne Rosa, M. Petersen, M. J. van der Laan, J. Fabiani, M. Fernandez-Gerlinger, I. Podglajen, D. Safran, B. Cholley, J. Mainardi
Year: 2013
Publication Date: 2013-10-04
Venue: Critical Care
DOI: 10.1186/cc13041
Link: Semantic Scholar
Matched Keywords: super learning, targeted maximum likelihood estimation

Abstract:
IntroductionCardiac surgery is frequently needed in patients with infective endocarditis (IE). Acute kidney injury (AKI) often complicates IE and is associated with poor outcomes. The purpose of the study was to determine the risk factors for post-operative AKI in patients operated on for IE.MethodsA retrospective, non-interventional study of prospectively collected data (2000–2010) included patients with IE and cardiac surgery with cardio-pulmonary bypass. The primary outcome was post-operative AKI, defined as the development of AKI or progression of AKI based on the acute kidney injury network (AKIN) definition. We used ensemble machine learning (“Super Learning”) to develop a predictor of AKI based on potential risk factors, and evaluated its performance using V-fold cross validation. We identified clinically important predictors among a set of risk factors using Targeted Maximum Likelihood Estimation.Results202 patients were included, of which 120 (59%) experienced a post-operative AKI. 65 (32.2%) patients presented an AKI before surgery while 91 (45%) presented a progression of AKI in the post-operative period. 20 patients (9.9%) required a renal replacement therapy during the post-operative ICU stay and 30 (14.8%) died during their hospital stay. The following variables were found to be significantly associated with renal function impairment, after adjustment for other risk factors: multiple surgery (OR: 4.16, 95% CI: 2.98-5.80, p<0.001), pre-operative anemia (OR: 1.89, 95% CI: 1.34-2.66, p<0.001), transfusion requirement during surgery (OR: 2.38, 95% CI: 1.55-3.63, p<0.001), and the use of vancomycin (OR: 2.63, 95% CI: 2.07-3.34, p<0.001), aminoglycosides (OR: 1.44, 95% CI: 1.13-1.83, p=0.004) or contrast iodine (OR: 1.70, 95% CI: 1.37-2.12, p<0.001). Post-operative but not pre-operative AKI was associated with hospital mortality.ConclusionsPost-operative AKI following cardiopulmonary bypass for IE results from additive hits to the kidney. We identified several potentially modifiable risk factors such as treatment with vancomycin or aminoglycosides or pre-operative anemia.

2013-08-01 — Targeted maximum likelihood estimation in safety analysis.

Authors: S. Lendle, B. Fireman, M. J. van der Laan
Year: 2013
Publication Date: 2013-08-01
Venue: Journal of Clinical Epidemiology
DOI: 10.1016/j.jclinepi.2013.02.017
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2013-08-01 — Super learning to hedge against incorrect inference from arbitrary parametric assumptions in marginal structural modeling.

Authors: R. Neugebauer, B. Fireman, Jason A. Roy, M. Raebel, G. Nichols, P. O’Connor
Year: 2013
Publication Date: 2013-08-01
Venue: Journal of Clinical Epidemiology
DOI: 10.1016/j.jclinepi.2013.01.016
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Abstract unavailable from Semantic Scholar.

2013-05-29 — A Marginal Structural Modeling Approach with Super Learning for a Study on Oral Bisphosphonate Therapy and Atrial Fibrillation

Authors: Romain Neugebauer, M. Chandra, Antonio Paredes, David J. Graham, Carolyn McCloskey, Alan S. Go
Year: 2013
Publication Date: 2013-05-29
DOI: 10.1515/JCI-2012-0003
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Abstract unavailable from Semantic Scholar.

2013 — Targeted maximum likelihood estimation for marginal time-dependent treatment effects under density misspecification.

Authors: M. Schnitzer, E. Moodie, R. Platt
Year: 2013
Venue: Biostatistics
DOI: 10.1093/biostatistics/kxs024
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2013 — R Lab 5 - TMLE (data)

Authors: Laura Balzer
Year: 2013
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract unavailable from Semantic Scholar.

2013 — R Lab 5 - TMLE

Authors: L. Balzer, M. Petersen, Alexander Luedtke
Year: 2013
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract unavailable from Semantic Scholar.

2013 — EFFECT OF SUPER LEARNING TECHNIQUES ON STUDENTS ACADEMIC ACHIEVEMENT IN ENGLISH SUBJECT AT SECONDARY LEVEL IN KHYBER PAKHTUNKHWA

Authors: M. Ayaz, Malik Amer Atta
Year: 2013
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Abstract unavailable from Semantic Scholar.

2013 — An Overview of Targeted Maximum Likelihood Estimation

Authors: Susan Gruber
Year: 2013
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2013 — A Marginal Structural Modeling Approach with Super Learning for a Study on Oral Bisphosphonate Therapy and Atrial Fibrillation

Authors: R. Neugebauer, M. Chandra, Antonio Paredes, David J. Graham, C. McCloskey, Alan S. Go
Year: 2013
DOI: 10.1515/jci-2012-0003
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Abstract unavailable from Semantic Scholar.

2012 (14 papers)
2012-11-16 — tmle : An R Package for Targeted Maximum Likelihood Estimation

Authors: Susan Gruber, M. J. Laan
Year: 2012
Publication Date: 2012-11-16
DOI: 10.18637/JSS.V051.I13
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
Targeted maximum likelihood estimation (TMLE) is a general approach for constructing an efficient double-robust semi-parametric substitution estimator of a causal effect parameter or statistical association measure. tmle is a recently developed R package that implements TMLE of the effect of a binary treatment at a single point in time on an outcome of interest, controlling for user supplied covariates, including an additive treatment effect, relative risk, odds ratio, and the controlled direct effect of a binary treatment controlling for a binary intermediate variable on the pathway from treatment to the out- come. Estimation of the parameters of a marginal structural model is also available. The package allows outcome data with missingness, and experimental units that contribute repeated records of the point-treatment data structure, thereby allowing the analysis of longitudinal data structures. Relevant factors of the likelihood may be modeled or fit data-adaptively according to user specifications, or passed in from an external estimation procedure. Effect estimates, variances, p values, and 95% confidence intervals are provided by the software.

2012-09-18 — A General Implementation of TMLE for Longitudinal Data Applied to Causal Inference in Survival Analysis

Authors: Ori Stitelman, V. De Gruttola, M. J. van der Laan
Year: 2012
Publication Date: 2012-09-18
Venue: The International Journal of Biostatistics
DOI: 10.1515/1557-4679.1334
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract unavailable from Semantic Scholar.

2012-07-12 — Targeted Minimum Loss-Based Estimation of a Causal Effect Using Interval-Censored Time-to-Event Data

Authors: M. Carone, M. Petersen, M. J. Laan
Year: 2012
Publication Date: 2012-07-12
DOI: 10.1201/B12290-11
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2012-04-03 — Investigation of Super Learner Methodology on HIV-1 Small Sample: Application on Jaguar Trial Data

Authors: A. Houssaïni, L. Assoumou, A. Marcelin, J. Molina, V. Calvez, P. Flandre
Year: 2012
Publication Date: 2012-04-03
Venue: AIDS Research and Treatment
DOI: 10.1155/2012/478467
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Background. Many statistical models have been tested to predict phenotypic or virological response from genotypic data. A statistical framework called Super Learner has been introduced either to compare different methods/learners (discrete Super Learner) or to combine them in a Super Learner prediction method. Methods. The Jaguar trial is used to apply the Super Learner framework. The Jaguar study is an “add-on” trial comparing the efficacy of adding didanosine to an on-going failing regimen. Our aim was also to investigate the impact on the use of different cross-validation strategies and different loss functions. Four different repartitions between training set and validations set were tested through two loss functions. Six statistical methods were compared. We assess performance by evaluating R 2 values and accuracy by calculating the rates of patients being correctly classified. Results. Our results indicated that the more recent Super Learner methodology of building a new predictor based on a weighted combination of different methods/learners provided good performance. A simple linear model provided similar results to those of this new predictor. Slight discrepancy arises between the two loss functions investigated, and slight difference arises also between results based on cross-validated risks and results from full dataset. The Super Learner methodology and linear model provided around 80% of patients correctly classified. The difference between the lower and higher rates is around 10 percent. The number of mutations retained in different learners also varys from one to 41. Conclusions. The more recent Super Learner methodology combining the prediction of many learners provided good performance on our small dataset.

2012 — The International Journal of Biostatistics Targeted Minimum Loss Based Estimation of Causal Effects of Multiple Time Point Interventions

Authors: Mark J. van der Laan, Susan Gruber
Year: 2012
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2012 — The International Journal of Biostatistics Targeted Minimum Loss Based Estimation of a Causal Effect on an Outcome with Known Conditional Bounds

Authors: Susan Gruber
Year: 2012
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2012 — The International Journal of Biostatistics Targeted Maximum Likelihood Estimation of Natural Direct Effects

Authors: Wenjing Zheng
Year: 2012
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2012 — The International Journal of Biostatistics Targeted Maximum Likelihood Estimation for Prediction Calibration

Authors: J. Brooks, Mark J. van der Laan, A. Go
Year: 2012
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2012 — The International Journal of Biostatistics Targeted Maximum Likelihood Estimation for Dynamic Treatment Regimes in Sequentially Randomized Controlled Trials

Authors: P. Chaffee
Year: 2012
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2012 — The International Journal of Biostatistics Super Learner Based Conditional Density Estimation with Application to Marginal Structural Models

Authors: Iván Díaz Muñoz
Year: 2012
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2012 — Targeted Minimum Loss Based Estimation for Longitudinal Data

Authors: P. Chaffee
Year: 2012
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2012 — Super Learner and Targeted Maximum Likelihood Estimation for Longitudinal Data Structures with Applications to Atrial Fibrillation

Authors: J. Brooks
Year: 2012
Link: Semantic Scholar
Matched Keywords: super learner, targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2012 — Estimation of a non-parametric variable importance measure of a continuous exposure.

Authors: A. Chambaz, P. Neuvial, M. J. van der Laan
Year: 2012
Venue: Electronic Journal of Statistics
DOI: 10.1214/12-EJS703
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
We define a new measure of variable importance of an exposure on a continuous outcome, accounting for potential confounders. The exposure features a reference level x(0) with positive mass and a continuum of other levels. For the purpose of estimating it, we fully develop the semi-parametric estimation methodology called targeted minimum loss estimation methodology (TMLE) [23, 22]. We cover the whole spectrum of its theoretical study (convergence of the iterative procedure which is at the core of the TMLE methodology; consistency and asymptotic normality of the estimator), practical implementation, simulation study and application to a genomic example that originally motivated this article. In the latter, the exposure X and response Y are, respectively, the DNA copy number and expression level of a given gene in a cancer cell. Here, the reference level is x(0) = 2, that is the expected DNA copy number in a normal cell. The confounder is a measure of the methylation of the gene. The fact that there is no clear biological indication that X and Y can be interpreted as an exposure and a response, respectively, is not problematic.

2012 — Application of Targeted Maximum Likelihood Estimation to the Meta-Analysis of Safety Data

Authors: Susan Gruber, M. J. Laan
Year: 2012
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2011 (16 papers)
2011-07-29 — Dimension reduction with gene expression data using targeted variable importance measurement

Authors: Hui Wang, M. J. Laan
Year: 2011
Publication Date: 2011-07-29
Venue: BMC Bioinformatics
DOI: 10.1186/1471-2105-12-312
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation, tmle

Abstract:
BackgroundWhen a large number of candidate variables are present, a dimension reduction procedure is usually conducted to reduce the variable space before the subsequent analysis is carried out. The goal of dimension reduction is to find a list of candidate genes with a more operable length ideally including all the relevant genes. Leaving many uninformative genes in the analysis can lead to biased estimates and reduced power. Therefore, dimension reduction is often considered a necessary predecessor of the analysis because it can not only reduce the cost of handling numerous variables, but also has the potential to improve the performance of the downstream analysis algorithms.ResultsWe propose a TMLE-VIM dimension reduction procedure based on the variable importance measurement (VIM) in the frame work of targeted maximum likelihood estimation (TMLE). TMLE is an extension of maximum likelihood estimation targeting the parameter of interest. TMLE-VIM is a two-stage procedure. The first stage resorts to a machine learning algorithm, and the second step improves the first stage estimation with respect to the parameter of interest.ConclusionsWe demonstrate with simulations and data analyses that our approach not only enjoys the prediction power of machine learning algorithms, but also accounts for the correlation structures among variables and therefore produces better variable rankings. When utilized in dimension reduction, TMLE-VIM can help to obtain the shortest possible list with the most truly associated variables.

2011-07-01 — Super Learning : Praktik belajar Mengajar yang serba efektif dan mencerdaskan

Authors: A. k
Year: 2011
Publication Date: 2011-07-01
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Abstract unavailable from Semantic Scholar.

2011 — TMLE in Adaptive Group Sequential Covariate-Adjusted RCTs

Authors: A. Chambaz, M. J. Laan
Year: 2011
DOI: 10.1007/978-1-4419-9782-1_29
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract unavailable from Semantic Scholar.

2011 — The International Journal of Biostatistics Targeted Maximum Likelihood Estimation of Effect Modification Parameters in Survival Analysis

Authors: Victor De Gruttola, Mark J. van der, Laan
Year: 2011
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2011 — The International Journal of Biostatistics Collaborative Double Robust Targeted Maximum Likelihood Estimation

Authors: M. J. van der Laan, Susan Gruber
Year: 2011
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2011 — The International Journal of Biostatistics An Application of Collaborative Targeted Maximum Likelihood Estimation in Causal Inference and Genomics

Authors: Susan Gruber
Year: 2011
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2011 — Targeted Minimum Loss Based Estimation of an Intervention Specific Mean Outcome

Authors: M. J. Laan, Susan Gruber
Year: 2011
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2011 — Targeted Minimum Loss Based Estimation Based on Directly Solving the Efficient Influence Curve Equation

Authors: P. Chaffee, M. J. Laan
Year: 2011
Link: Semantic Scholar
Matched Keywords: targeted minimum loss based estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2011 — Targeted Maximum Likelihood Estimation of Conditional Relative Risk in a Semi-parametric Regression Model

Authors: C. Tuglus, Kristin E. Porter, M. J. Laan
Year: 2011
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2011 — Super Learning for Right-Censored Data

Authors: E. Polley, M. Laan
Year: 2011
DOI: 10.1007/978-1-4419-9782-1_16
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Abstract unavailable from Semantic Scholar.

2011 — Statistical Applications in Genetics and Molecular Biology Super Learning : An Application to the Prediction of HIV-1 Drug Resistance

Authors: E. Polley, M. Petersen, Soo-Yon Rhee
Year: 2011
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Abstract unavailable from Semantic Scholar.

2011 — Propensity-Score-Based Estimators and C-TMLE

Authors: J. Sekhon, Susan Gruber, Kristin E. Porter, M. J. Laan
Year: 2011
DOI: 10.1007/978-1-4419-9782-1_21
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract unavailable from Semantic Scholar.

2011 — Introduction to TMLE

Authors: Sherri Rose, M. J. Laan
Year: 2011
DOI: 10.1007/978-1-4419-9782-1_4
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract unavailable from Semantic Scholar.

2011 — Foundations of TMLE

Authors: M. J. Laan, Sherri Rose
Year: 2011
DOI: 10.1007/978-1-4419-9782-1_30
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract unavailable from Semantic Scholar.

2011 — C-TMLE of an Additive Point Treatment Effect

Authors: Susan Gruber, M. J. Laan
Year: 2011
DOI: 10.1007/978-1-4419-9782-1_19
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract unavailable from Semantic Scholar.

2011 — C-TMLE for Time-to-Event Outcomes

Authors: Ori Stitelman, M. J. Laan
Year: 2011
DOI: 10.1007/978-1-4419-9782-1_20
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract unavailable from Semantic Scholar.

2010 (6 papers)
2010 — Targeted maximum likelihood estimation techniques for time to event data and the implications of coarsening an explanatory variable of interest via dichotomization in the context of causal inference in semi-parametric models

Authors: Ori Stitelman
Year: 2010
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2010 — Super Learner In Prediction

Authors: E. Polley, M. Laan
Year: 2010
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2010 — Simple Examples of Estimating Causal Effects Using Targeted Maximum Likelihood Estimation

Authors: Michael Rosenblum, M. J. Laan
Year: 2010
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2010 — Permutation-based Pathway Testing using the Super Learner Algorithm

Authors: P. Chaffee, A. Hubbard, M. V. D. Laan
Year: 2010
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2010 — ctmle: an R package for collaborative targeted maximum likelihood estimation

Authors: Susan Gruber, M. J. Laan
Year: 2010
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2010 — Asymptotic Theory for Cross-validated Targeted Maximum Likelihood Estimation

Authors: Wenjing Zheng, M. J. Laan
Year: 2010
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2009 (9 papers)
2009-09-29 — The Risk of Virologic Failure Decreases with Duration of HIV Suppression, at Greater than 50% Adherence to Antiretroviral Therapy

Authors: M. Rosenblum, S. Deeks, M. J. van der Laan, D. Bangsberg
Year: 2009
Publication Date: 2009-09-29
Venue: PLoS ONE
DOI: 10.1371/journal.pone.0007196
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Background We hypothesized that the percent adherence to antiretroviral therapy necessary to maintain HIV suppression would decrease with longer duration of viral suppression. Methodology Eligible participants were identified from the REACH cohort of marginally housed HIV infected adults in San Francisco. Adherence to antiretroviral therapy was measured through pill counts obtained at unannounced visits by research staff to each participant's usual place of residence. Marginal structural models and targeted maximum likelihood estimation methodologies were used to determine the effect of adherence to antiretroviral therapy on the probability of virologic failure during early and late viral suppression. Principal Findings A total of 221 subjects were studied (median age 44.1 years; median CD4+ T cell nadir 206 cells/mm3). Most subjects were taking the following types of antiretroviral regimens: non-nucleoside reverse transcriptase inhibitor based (37%), ritonavir boosted protease inhibitor based (28%), or unboosted protease inhibitor based (25%). Comparing the probability of failure just after achieving suppression vs. after 12 consecutive months of suppression, there was a statistically significant decrease in the probability of virologic failure for each range of adherence proportions we considered, as long as adherence was greater than 50%. The estimated risk difference, comparing the probability of virologic failure after 1 month vs. after 12 months of continuous viral suppression was 0.47 (95% CI 0.23–0.63) at 50–74% adherence, 0.29 (CI 0.03–0.50) at 75–89% adherence, and 0.36 (CI 0.23–0.48) at 90–100% adherence. Conclusions The risk of virologic failure for adherence greater than 50% declines with longer duration of continuous suppression. While high adherence is required to maximize the probability of durable viral suppression, the range of adherence capable of sustaining viral suppression is wider after prolonged periods of viral suppression.

2009-06-22 — A Web-Based Learning Platform Based on Super Learning Objects “Chao Zi”

Authors: Ning Kang, Tao Wang, Shengquan Yu, Heping Hao
Year: 2009
Publication Date: 2009-06-22
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Abstract unavailable from Semantic Scholar.

2009-01-15 — Covariate adjustment in randomized trials with binary outcomes: Targeted maximum likelihood estimation

Authors: K. Moore, M. J. Laan
Year: 2009
Publication Date: 2009-01-15
Venue: Statistics in Medicine
DOI: 10.1002/sim.3445
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2009 — tmleLite: A simplified R package for targeted maximum likelihood estimation

Authors: Susan Gruber, M. J. Laan
Year: 2009
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2009 — Targeted Maximum Likelihood Estimation: A Gentle Introduction

Authors: Susan Gruber, M. J. Laan
Year: 2009
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2009 — Targeted maximum likelihood estimation of treatment effects in randomized controlled trials and drug safety analysis

Authors: K. Moore
Year: 2009
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2009 — Readings in Targeted Maximum Likelihood Estimation

Authors: M. J. Laan, Sherri Rose, Susan Gruber
Year: 2009
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2009 — Collaborative Targeted Maximum Likelihood Estimation

Authors: Susan Gruber
Year: 2009
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2009 — Causal Inference for Nested Case-Control Studies using Targeted Maximum Likelihood Estimation

Authors: Sherri Rose, M. J. Laan
Year: 2009
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2008 (1 paper)
2008 — A Guide to Causal Parameters in Case-Control Designs: Targeted Maximum Likelihood Estimation

Authors: Sherri Rose, M. J. Laan
Year: 2008
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

2007 (1 paper)
2007 — Super Learner

Authors: Mark van der Laan, E. Polley, A. Hubbard
Year: 2007
Venue: Statistical Applications in Genetics and Molecular Biology
DOI: 10.2202/1544-6115.1309
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

2006 (1 paper)
2006 — Super Learning: An Application to Prediction of HIV-1 Drug Susceptibility

Authors: S. Sinisi, M. Petersen, M. Laan
Year: 2006
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Abstract unavailable from Semantic Scholar.

Unknown (6 papers)
Unknown — Super Learner Analysis of Electronic Adherence Data Super Learner Analysis of Electronic Adherence Data Improves Viral Prediction and May Provide Improves Viral Prediction and May Provide Strategies for Selective HIV RNA Monitoring Strategies for Selective HIV RNA Monitoring

Authors: Maya L Petersen, E. LeDell, Joshua Schwab, V. Sarovar, Robert Gross, Nancy Reynolds, J.E. Haberer, Kathy Goggin, Carol E. Golin, Julia H. Arnsten, Marc Rosen, R. Remien, David Etoori, Ira B. Wilson, J. Simoni, J. Erlen, Mark J. van der Laan, Honghu Liu, D. Bangsberg
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Abstract unavailable from Semantic Scholar.

Unknown — Handling missing data for causal effect estimation in cohort studies using Targeted Maximum Likelihood Estimation

Authors: Unknown
Link: Semantic Scholar
Matched Keywords: targeted maximum likelihood estimation

Abstract:
Abstract unavailable from Semantic Scholar.

Unknown — Bagging for the highly adaptive lasso

Authors: Unknown
Link: Semantic Scholar
Matched Keywords: highly adaptive lasso

Abstract:
Abstract unavailable from Semantic Scholar.

Unknown — Analyzing NYPD stop, question, and frisk with machine learning techniques

Authors: Passiri Bodhidatta
DOI: 10.58837/chula.the.2021.200
Link: Semantic Scholar
Matched Keywords: super learner

Abstract:
Although stops from “Stop, Question, and Frisk” program have decreased dramatically after the New York Police Department (NYPD) reform in 2013, the unnecessary stops and weapon use against innocent citizens remain critical problems. This study analyzes the stops during 2014 – 2019, using three tree-based machine learning approaches: Decision Tree, Random Forest, and XGBoost. Models for predicting stops that resulted in a conviction and police's level of force used are developed and driving factors are identified. Results show that XGBoost outperformed other models in both predictions. The performance of Guilty Prediction was at 65.9% F1 score and 84.0% accuracy. For Level of Force Prediction, the F1 score obtained for “Level 1” and “Level 2” were 40.7% and 35.0% respectively, with 80.4% overall accuracy. The findings indicated that the presence of a weapon implies a suspect's conviction. Despite that, numerous unnecessary stops are likely driven by inaccurate assumptions about suspect's weapon possession, which lead to police's gunfire usage against innocent citizens. Additionally, this study explores a hybrid technique called Super Learner. Experiments on various structures of Super Learners are performed. For base models, Super Learners can improve performance from their own base models when using untuned base models but do not improve when using tuned base models. The performance of base models also played a significant role in the performance of Super Learners, namely having high-performance base models improved meta models' performance, and vice versa. For meta models, XGBoost and Logistic Regression outperform other meta models across both predictions.�

Unknown — Analysis of Breast Cancer Risk Factors Data: Association Rule Mining based on Ethnic Groups and Classification using Super Learning

Authors: Md Faisal Kabir, S. Ludwig, Abu Saleh
Link: Semantic Scholar
Matched Keywords: super learning

Abstract:
Abstract unavailable from Semantic Scholar.

Unknown — A Longitudinal TMLE of a Mean Outcome A . 1 Observed Data and Likelihood

Authors: Unknown
Link: Semantic Scholar
Matched Keywords: tmle

Abstract:
Abstract unavailable from Semantic Scholar.