References

Aas, Kjersti, Martin Jullum, and Anders Løland. 2021. “Explaining Individual Predictions When Features Are Dependent: More Accurate Approximations to Shapley Values.” Artificial Intelligence 298 (September): 103502. https://doi.org/10.1016/j.artint.2021.103502.
Abdar, Moloud, Farhad Pourpanah, Sadiq Hussain, Dana Rezazadegan, Li Liu, Mohammad Ghavamzadeh, Paul Fieguth, et al. 2021. “A Review of Uncertainty Quantification in Deep Learning: Techniques, Applications and Challenges.” Information Fusion 76 (December): 243–97. https://doi.org/10.1016/j.inffus.2021.05.008.
Agrawal, Rakesh, and Ramakrishnan Srikant. 1994. “Fast Algorithms for Mining Association Rules in Large Databases.” In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), 487–99.
Angelopoulos, Anastasios N., and Stephen Bates. 2023. “Conformal Prediction: A Gentle Introduction.” Foundations and Trends in Machine Learning 16 (4): 494–591.
Athalye, Anish, Nicholas Carlini, and David Wagner. 2018. “Obfuscated Gradients Give a False Sense of Security.” In International Conference on Machine Learning (ICML).
Athey, Susan, Julie Tibshirani, and Stefan Wager. 2019. “Generalized Random Forests.” The Annals of Statistics 47 (2): 1148–78.
Baldi, Pierre, and Kurt Hornik. 1989. “Neural Networks and Principal Component Analysis: Learning from Examples Without Local Minima.” Neural Networks 2 (1): 53–58.
Belkin, Mikhail, Daniel Hsu, Siyuan Ma, and Soumik Mandal. 2019. “Reconciling Modern Machine-Learning Practice and the Classical Bias-Variance Trade-Off.” Proceedings of the National Academy of Sciences 116 (32): 15849–54.
Belkin, Mikhail, and Partha Niyogi. 2003. “Laplacian Eigenmaps for Dimensionality Reduction and Data Representation.” Neural Computation 15 (6): 1373–96. https://doi.org/10.1162/089976603321780317.
Bishop, Christopher M. 2006. Pattern Recognition and Machine Learning. Springer.
Blei, David M. 2012. “Probabilistic Topic Models.” Communications of the ACM 55 (4): 77–84.
Blei, David M., Andrew Y. Ng, and Michael I. Jordan. 2003. “Latent Dirichlet Allocation.” Journal of Machine Learning Research 3: 993–1022.
Breiman, Leo. 1996. “Stacked Regressions.” Machine Learning 24 (1): 49–64.
Brunet, Jean-Philippe, Pablo Tamayo, Todd R. Golub, and Jill P. Mesirov. 2004. “Metagenes and Molecular Pattern Discovery Using Matrix Factorization.” Proceedings of the National Academy of Sciences 101 (12): 4164–69.
Burges, Chris, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. 2005. “Learning to Rank Using Gradient Descent.” In International Conference on Machine Learning (ICML).
Burges, Christopher J. C. 2010. “From RankNet to LambdaRank to LambdaMART: An Overview.” MSR-TR-2010-82. Microsoft Research.
Cao, Zhe, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. 2007. “Learning to Rank: From Pairwise Approach to Listwise Approach.” In Proceedings of the 24th International Conference on Machine Learning, 129–36.
Chang, Jonathan, Jordan Boyd-Graber, Sean Gerrish, Chong Wang, and David M. Blei. 2009. “Reading Tea Leaves: How Humans Interpret Topic Models.” In Advances in Neural Information Processing Systems (NeurIPS).
Chawla, N. V., K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. 2002. “SMOTE: Synthetic Minority over-Sampling Technique.” Journal of Artificial Intelligence Research 16 (June): 321–57. https://doi.org/10.1613/jair.953.
Chen, Lisha, and Andreas Buja. 2009. “Local Multidimensional Scaling for Nonlinear Dimension Reduction, Graph Drawing, and Proximity Analysis.” Journal of the American Statistical Association 104 (485): 209–19. https://doi.org/10.1198/jasa.2009.0111.
Chen, Zhao, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich. 2018. “GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks.” In International Conference on Machine Learning (ICML).
Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. 2018. “Double/Debiased Machine Learning for Treatment and Structural Parameters.” The Econometrics Journal 21 (1): C1–68.
Chipman, Hugh A., Edward I. George, and Robert E. McCulloch. 2010. BART: Bayesian Additive Regression Trees.” The Annals of Applied Statistics 4 (1): 266–98. https://doi.org/10.1214/09-AOAS285.
Chollet, François. 2018. Deep Learning with r / François Chollet with j.j. Allaire. 1st edition. Shelter Island, NY: Manning Publications.
Chollet, Francois, Tomasz Kalinowski, and J. J. Allaire. 2022. Deep Learning with r. 2nd ed. Manning Publications.
Cohen, Jeremy M., Elan Rosenfeld, and J. Zico Kolter. 2019. “Certified Adversarial Robustness via Randomized Smoothing.” In International Conference on Machine Learning (ICML).
Datta, Anupam, Shayak Sen, and Yair Zick. 2016. “Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems.” 2016 IEEE Symposium on Security and Privacy (SP), May. https://doi.org/10.1109/sp.2016.42.
Domingos, Pedro, and Michael Pazzani. 1997. “On the Optimality of the Simple Bayesian Classifier Under Zero-One Loss.” Machine Learning 29 (2–3): 103–30.
Dutta, Praneet, Man Kit, Cheuk, Jonathan S Kim, and Massimo Mascaro. 2019. “AutoML for Contextual Bandits.” https://arxiv.org/abs/1909.03212.
Evgeniou, Theodoros, and Massimiliano Pontil. 2004. “Regularized Multi-Task Learning.” In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD).
Fedus, William, Barret Zoph, and Noam Shazeer. 2022. “Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity.” Journal of Machine Learning Research 23 (120): 1–39.
Freund, Yoav, and Robert E Schapire. 1997. “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting.” Journal of Computer and System Sciences 55 (1): 119–39. https://doi.org/10.1006/jcss.1997.1504.
Furnival, George M., and Robert W. Wilson. 1974. “Regressions by Leaps and Bounds.” Technometrics 16 (4): 499–511. https://doi.org/10.1080/00401706.1974.10489231.
Ganin, Yaroslav, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, Francois Laviolette, Mario Marchand, and Victor Lempitsky. 2016b. “Domain-Adversarial Training of Neural Networks.” Journal of Machine Learning Research 17 (59): 1–35.
———. 2016a. “Domain-Adversarial Training of Neural Networks.” Journal of Machine Learning Research 17 (59): 1–35.
Gaussier, Eric, and Cyril Goutte. 2005. “Relation Between PLSA and NMF and Implications.” In Proceedings of the 28th Annual International ACM SIGIR Conference, 601–2.
George, Edward I., and Robert E. McCulloch. 1993. “Variable Selection via Gibbs Sampling.” Journal of the American Statistical Association 88 (423): 881–89. https://doi.org/10.1080/01621459.1993.10476353.
Gibbs, Isaac, and Emmanuel J. Candes. 2021. “Adaptive Conformal Inference Under Distribution Shift.” In Advances in Neural Information Processing Systems (NeurIPS).
Glorot, Xavier, Antoine Bordes, and Yoshua Bengio. 2011. “Deep Sparse Rectifier Neural Networks.” In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS), 15:315–23. Proceedings of Machine Learning Research. PMLR. https://proceedings.mlr.press/v15/glorot11a.html.
Goodfellow, Ian J., Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. “Generative Adversarial Nets.” In Advances in Neural Information Processing Systems (NeurIPS).
Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. 2015. “Explaining and Harnessing Adversarial Examples.” In International Conference on Learning Representations (ICLR).
Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press.
Griffiths, Thomas L., and Mark Steyvers. 2004. “Finding Scientific Topics.” Proceedings of the National Academy of Sciences 101 (suppl 1): 5228–35.
Han, Jiawei, Jian Pei, and Yiwen Yin. 2000. “Mining Frequent Patterns Without Candidate Generation.” In Proceedings of the ACM SIGMOD International Conference on Management of Data, 1–12.
Hastie, Trevor, Jerome Friedman, and Robert Tibshirani. 2001. The Elements of Statistical Learning. Springer New York. https://doi.org/10.1007/978-0-387-21606-5.
Hastie, Trevor, Andrea Montanari, Saharon Rosset, and Ryan J. Tibshirani. 2022. “Surprises in High-Dimensional Ridgeless Least Squares Interpolation.” The Annals of Statistics 50 (2): 949–86.
Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. 2006. “Reducing the Dimensionality of Data with Neural Networks.” Science 313 (5786): 504–7.
Hoffman, Matthew D., David M. Blei, and Francis Bach. 2010. “Online Learning for Latent Dirichlet Allocation.” In Advances in Neural Information Processing Systems (NeurIPS).
Hofmann, Thomas. 1999. “Probabilistic Latent Semantic Indexing.” In Proceedings of the 22nd Annual International ACM SIGIR Conference, 50–57.
Howard, Jeremy, and Sebastian Ruder. 2018. “Universal Language Model Fine-Tuning for Text Classification.” In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL).
Jacobs, Robert A., Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. 1991. “Adaptive Mixtures of Local Experts.” Neural Computation 3 (1): 79–87.
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. “Statistical Learning.” In, 15–57. Springer New York. https://doi.org/10.1007/978-1-4614-7138-7_2.
Joachims, Thorsten. 2002. “Optimizing Search Engines Using Clickthrough Data.” In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 133–42.
Jordan, Michael I., and Robert A. Jacobs. 1994. “Hierarchical Mixtures of Experts and the EM Algorithm.” Neural Computation 6 (2): 181–214.
Kendall, Alex, Yarin Gal, and Roberto Cipolla. 2018. “Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Kingma, Diederik P., and Max Welling. 2014. “Auto-Encoding Variational Bayes.” In International Conference on Learning Representations (ICLR).
Koenker, Roger. 2005. Quantile Regression. Econometric Society Monographs 38. Cambridge University Press. https://doi.org/10.1017/CBO9780511754098.
Koikkalainen, Pasi. 1999. “Tree Structured Self-Organizing Maps.” In, 121–30. Elsevier. https://doi.org/10.1016/b978-044450270-4/50009-7.
Kuhn, Max. 2014. “Futility Analysis in the Cross-Validation of Machine Learning Models.” https://arxiv.org/abs/1405.6974.
Kumar, I Elizabeth, Suresh Venkatasubramanian, Carlos Scheidegger, and Sorelle Friedler. 2020. “Problems with Shapley-Value-Based Explanations as Feature Importance Measures.” In International Conference on Machine Learning, 5491–5500. PMLR.
Kunzel, Soren R., Jasjeet S. Sekhon, Peter J. Bickel, and Bin Yu. 2019. “Metalearners for Estimating Heterogeneous Treatment Effects Using Machine Learning.” Proceedings of the National Academy of Sciences 116 (10): 4156–65.
Landsman, Vardit, and Stefan Stremersch. 2020. “The Commercial Consequences of Collective Layoffs: Close the Plant, Lose the Brand?” Journal of Marketing 84 (3): 122–41. https://doi.org/10.1177/0022242919901277.
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. 2015. “Deep Learning.” Nature 521 (7553): 436–44. https://doi.org/10.1038/nature14539.
Lee, Daniel D., and H. Sebastian Seung. 1999a. “Learning the Parts of Objects by Non-Negative Matrix Factorization.” Nature 401 (6755): 788–91.
———. 1999b. “Learning the Parts of Objects by Non-Negative Matrix Factorization.” Nature 401 (6755): 788–91. https://doi.org/10.1038/44565.
———. 2001. “Algorithms for Non-Negative Matrix Factorization.” In Advances in Neural Information Processing Systems (NeurIPS).
Lei, Jing, Max G’Sell, Alessandro Rinaldo, Ryan J. Tibshirani, and Larry Wasserman. 2018. “Distribution-Free Predictive Inference for Regression.” Journal of the American Statistical Association 113 (523): 1094–1111.
Lei, Jing, and Larry Wasserman. 2014. “Distribution-Free Prediction Bands for Non-Parametric Regression.” Journal of the Royal Statistical Society: Series B 76 (1): 71–96.
Linde, Y., A. Buzo, and R. Gray. 1980. “An Algorithm for Vector Quantizer Design.” IEEE Transactions on Communications 28 (1): 84–95. https://doi.org/10.1109/tcom.1980.1094577.
Lipovetsky, Stan, and Michael Conklin. 2001a. “Analysis of Regression in Game Theory Approach.” Applied Stochastic Models in Business and Industry 17 (4): 319–30. https://doi.org/10.1002/asmb.446.
Lipovetsky, Stan, and W.Michael Conklin. 2001b. “Multiobjective Regression Modifications for Collinearity.” Computers and Operations Research 28 (13): 1333–45. https://doi.org/10.1016/s0305-0548(00)00043-5.
Lundberg, Scott M, Gabriel G Erion, and Su-In Lee. 2018. “Consistent Individualized Feature Attribution for Tree Ensembles.” arXiv Preprint arXiv:1802.03888.
Lundberg, Scott M, and Su-In Lee. 2017. “A Unified Approach to Interpreting Model Predictions.” In Proceedings of the 31st International Conference on Neural Information Processing Systems, 4768–77.
MacDonald, Blake, Pritam Ranjan, and Hugh Chipman. 2015. “GPfit: An R Package for Fitting a Gaussian Process Model to Deterministic Simulator Outputs.” Journal of Statistical Software 64 (12). https://doi.org/10.18637/jss.v064.i12.
Machado, Marcos Roberto, Salma Karray, and Ivaldo Tributino de Sousa. 2019. “LightGBM: An Effective Decision Tree Gradient Boosting Method to Predict Customer Loyalty in the Finance Industry.” 2019 14th International Conference on Computer Science and Education (ICCSE), August. https://doi.org/10.1109/iccse.2019.8845529.
Madry, Aleksander, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. “Towards Deep Learning Models Resistant to Adversarial Attacks.” In International Conference on Learning Representations (ICLR).
Mahajan, Vijay, Subhash Sharma, and Robert D. Buzzell. 1993. “Assessing the Impact of Competitive Entry on Market Expansion and Incumbent Sales.” Journal of Marketing 57 (3): 39. https://doi.org/10.2307/1251853.
Merrick, Luke, and Ankur Taly. 2020. “The Explanation Game: Explaining Machine Learning Models Using Shapley Values.” In International Cross-Domain Conference for Machine Learning and Knowledge Extraction, 17–38. Springer.
Mitchell, Margaret, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. “Model Cards for Model Reporting.” In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM. https://doi.org/10.1145/3287560.3287596.
Murphy, Kevin P. 2012. Machine Learning: A Probabilistic Perspective. MIT Press.
Nakkiran, Preetum, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, and Ilya Sutskever. 2021. “Deep Double Descent: Where Bigger Models and More Data Hurt.” Journal of Statistical Mechanics: Theory and Experiment 2021 (12): 124003.
Nakkiran, Preetum, Prayaag Venkat, Sham Kakade, and Tengyu Ma. 2021. “Optimal Regularization Can Mitigate Double Descent.” In International Conference on Learning Representations (ICLR).
Ng, Andrew Y., and Michael I. Jordan. 2002. “On Discriminative Vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes.” In Advances in Neural Information Processing Systems (NIPS), 14:841–48.
Nie, Xinkun, and Stefan Wager. 2021. “Quasi-Oracle Estimation of Heterogeneous Treatment Effects.” Biometrika 108 (2): 299–319.
Pan, Sinno Jialin, and Qiang Yang. 2010. “A Survey on Transfer Learning.” IEEE Transactions on Knowledge and Data Engineering 22 (10): 1345–59.
Rabiner, Lawrence R. 1989. “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.” Proceedings of the IEEE 77 (2): 257–86.
Radcliffe, Nicholas J. 2007. “Using Control Groups to Target on Predicted Lift: Building and Assessing Uplift Models.” Direct Marketing Analytics Journal, 14–21.
Rasmussen, Carl Edward, and Christopher K. I. Williams. 2005. Gaussian Processes for Machine Learning. The MIT Press. https://doi.org/10.7551/mitpress/3206.001.0001.
Ribeiro, Marco, Sameer Singh, and Carlos Guestrin. 2016. Why Should i Trust You?: Explaining the Predictions of Any Classifier.” Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations. https://doi.org/10.18653/v1/n16-3020.
Ročková, Veronika, and Stéphanie van der Pas. 2020. “Posterior Concentration for Bayesian Regression Trees and Forests.” The Annals of Statistics 48 (4): 2108–31. https://doi.org/10.1214/19-AOS1879.
Romano, Yaniv, Evan Patterson, and Emmanuel J. Candes. 2019. “Conformalized Quantile Regression.” In Advances in Neural Information Processing Systems (NeurIPS).
Roweis, Sam T., and Lawrence K. Saul. 2000. “Nonlinear Dimensionality Reduction by Locally Linear Embedding.” Science 290 (5500): 2323–26. https://doi.org/10.1126/science.290.5500.2323.
Ruder, Sebastian. 2017. “An Overview of Multi-Task Learning in Deep Neural Networks.” arXiv Preprint arXiv:1706.05098.
Samek, Wojciech, Gregoire Montavon, Sebastian Lapuschkin, Christopher J. Anders, and Klaus-Robert Muller. 2021. “Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications.” Proceedings of the IEEE 109 (3): 247–78. https://doi.org/10.1109/jproc.2021.3060483.
Shazeer, Noam, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. 2017. “Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer.” In International Conference on Learning Representations (ICLR).
Sparapani, Rodney, Charles Spanbauer, and Robert McCulloch. 2021. “Nonparametric Machine Learning and Efficient Computation with Bayesian Additive Regression Trees: The BART R Package.” Journal of Statistical Software 97 (1). https://doi.org/10.18637/jss.v097.i01.
Štrumbelj, Erik, and Igor Kononenko. 2013. “Explaining Prediction Models and Individual Predictions with Feature Contributions.” Knowledge and Information Systems 41 (3): 647–65. https://doi.org/10.1007/s10115-013-0679-x.
Sugiyama, Masashi, Matthias Krauledat, and Klaus-Robert Muller. 2007. “Covariate Shift Adaptation by Importance Weighted Cross Validation.” Journal of Machine Learning Research 8: 985–1005.
Sundararajan, Mukund, and Amir Najmi. 2020. “The Many Shapley Values for Model Explanation.” In International Conference on Machine Learning, 9269–78. PMLR.
Szegedy, Christian, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2014. “Intriguing Properties of Neural Networks.” In International Conference on Learning Representations (ICLR).
Tenenbaum, Joshua B., Vin de Silva, and John C. Langford. 2000. “A Global Geometric Framework for Nonlinear Dimensionality Reduction.” Science 290 (5500): 2319–23. https://doi.org/10.1126/science.290.5500.2319.
Tibshirani, Robert. 1996. “Regression Shrinkage and Selection Via the Lasso.” Journal of the Royal Statistical Society: Series B (Methodological) 58 (1): 267–88. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x.
Tibshirani, Ryan J., Rina Foygel Barber, Emmanuel J. Candes, and Aaditya Ramdas. 2019. “Conformal Prediction Under Covariate Shift.” In Advances in Neural Information Processing Systems (NeurIPS).
Tsipras, Dimitris, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Aleksander Madry. 2019. “Robustness May Be at Odds with Accuracy.” In International Conference on Learning Representations (ICLR).
Vincent, Pascal. 2011. “A Connection Between Score Matching and Denoising Autoencoders.” Neural Computation 23 (7): 1661–74.
Vovk, Vladimir. 2012. “Conditional Validity of Inductive Conformal Predictors.” In Proceedings of the Asian Conference on Machine Learning (ACML).
Vovk, Vladimir, Alexander Gammerman, and Glenn Shafer. 2005. Algorithmic Learning in a Random World. Springer.
Wager, Stefan, and Susan Athey. 2018. “Estimation and Inference of Heterogeneous Treatment Effects Using Random Forests.” Journal of the American Statistical Association 113 (523): 1228–42.
Wikle, Christopher K., Andrew Zammit-Mangion, and Noel Cressie. 2019. “Introduction to Spatio-Temporal Statistics.” In, 1–16. Chapman; Hall/CRC. https://doi.org/10.1201/9781351769723-1.
Wolpert, David H. 1992. “Stacked Generalization.” Neural Networks 5 (2): 241–59.
Xia, Fen, Tie-Yan Liu, Jue Wang, Wensheng Zhang, and Hang Li. 2008. “Listwise Approach to Learning to Rank: Theory and Algorithm.” In Proceedings of the 25th International Conference on Machine Learning, 1192–99.
Yosinski, Jason, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. “How Transferable Are Features in Deep Neural Networks?” In Advances in Neural Information Processing Systems (NeurIPS).
Yu, Keming, Zudi Lu, and Julian Stander. 2003. “Quantile Regression: Applications and Current Research Areas.” Journal of the Royal Statistical Society: Series D (The Statistician) 52 (3): 331–50. https://www.jstor.org/stable/4128208.
Zhang, Yu, and Qiang Yang. 2021. “A Survey on Multi-Task Learning.” IEEE Transactions on Knowledge and Data Engineering 34 (12): 5586–5609.
Zhong, Guoqiang, and Guohua Yue. 2020. “Attention Recurrent Neural Networks for Image-Based Sequence Text Recognition.” In, 793–806. Springer International Publishing. https://doi.org/10.1007/978-3-030-41404-7_56.