References
Aas, Kjersti, Martin Jullum, and Anders Løland. 2021. “Explaining
Individual Predictions When Features Are Dependent: More Accurate
Approximations to Shapley Values.” Artificial
Intelligence 298 (September): 103502. https://doi.org/10.1016/j.artint.2021.103502.
Abdar, Moloud, Farhad Pourpanah, Sadiq Hussain, Dana Rezazadegan, Li
Liu, Mohammad Ghavamzadeh, Paul Fieguth, et al. 2021. “A Review of
Uncertainty Quantification in Deep Learning: Techniques, Applications
and Challenges.” Information Fusion 76 (December):
243–97. https://doi.org/10.1016/j.inffus.2021.05.008.
Agrawal, Rakesh, and Ramakrishnan Srikant. 1994. “Fast Algorithms
for Mining Association Rules in Large Databases.” In
Proceedings of the 20th International Conference on Very Large Data
Bases (VLDB), 487–99.
Angelopoulos, Anastasios N., and Stephen Bates. 2023. “Conformal
Prediction: A Gentle Introduction.” Foundations and Trends in
Machine Learning 16 (4): 494–591.
Athalye, Anish, Nicholas Carlini, and David Wagner. 2018.
“Obfuscated Gradients Give a False Sense of Security.” In
International Conference on Machine Learning (ICML).
Athey, Susan, Julie Tibshirani, and Stefan Wager. 2019.
“Generalized Random Forests.” The Annals of
Statistics 47 (2): 1148–78.
Baldi, Pierre, and Kurt Hornik. 1989. “Neural Networks and
Principal Component Analysis: Learning from Examples Without Local
Minima.” Neural Networks 2 (1): 53–58.
Belkin, Mikhail, Daniel Hsu, Siyuan Ma, and Soumik Mandal. 2019.
“Reconciling Modern Machine-Learning Practice and the Classical
Bias-Variance Trade-Off.” Proceedings of the National Academy
of Sciences 116 (32): 15849–54.
Belkin, Mikhail, and Partha Niyogi. 2003. “Laplacian Eigenmaps for
Dimensionality Reduction and Data Representation.” Neural
Computation 15 (6): 1373–96. https://doi.org/10.1162/089976603321780317.
Bishop, Christopher M. 2006. Pattern Recognition and Machine
Learning. Springer.
Blei, David M. 2012. “Probabilistic Topic Models.”
Communications of the ACM 55 (4): 77–84.
Blei, David M., Andrew Y. Ng, and Michael I. Jordan. 2003. “Latent
Dirichlet Allocation.” Journal of Machine Learning
Research 3: 993–1022.
Breiman, Leo. 1996. “Stacked Regressions.” Machine
Learning 24 (1): 49–64.
Brunet, Jean-Philippe, Pablo Tamayo, Todd R. Golub, and Jill P. Mesirov.
2004. “Metagenes and Molecular Pattern Discovery Using Matrix
Factorization.” Proceedings of the National Academy of
Sciences 101 (12): 4164–69.
Burges, Chris, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole
Hamilton, and Greg Hullender. 2005. “Learning to Rank Using
Gradient Descent.” In International Conference on Machine
Learning (ICML).
Burges, Christopher J. C. 2010. “From RankNet to LambdaRank to
LambdaMART: An Overview.” MSR-TR-2010-82. Microsoft Research.
Cao, Zhe, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. 2007.
“Learning to Rank: From Pairwise Approach to Listwise
Approach.” In Proceedings of the 24th International
Conference on Machine Learning, 129–36.
Chang, Jonathan, Jordan Boyd-Graber, Sean Gerrish, Chong Wang, and David
M. Blei. 2009. “Reading Tea Leaves: How Humans Interpret Topic
Models.” In Advances in Neural Information Processing Systems
(NeurIPS).
Chawla, N. V., K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. 2002.
“SMOTE: Synthetic Minority over-Sampling Technique.”
Journal of Artificial Intelligence Research 16 (June): 321–57.
https://doi.org/10.1613/jair.953.
Chen, Lisha, and Andreas Buja. 2009. “Local Multidimensional
Scaling for Nonlinear Dimension Reduction, Graph Drawing, and Proximity
Analysis.” Journal of the American Statistical
Association 104 (485): 209–19. https://doi.org/10.1198/jasa.2009.0111.
Chen, Zhao, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich.
2018. “GradNorm: Gradient Normalization for Adaptive Loss
Balancing in Deep Multitask Networks.” In International
Conference on Machine Learning (ICML).
Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duflo,
Christian Hansen, Whitney Newey, and James Robins. 2018.
“Double/Debiased Machine Learning for Treatment and Structural
Parameters.” The Econometrics Journal 21 (1): C1–68.
Chipman, Hugh A., Edward I. George, and Robert E. McCulloch. 2010.
“BART: Bayesian Additive Regression Trees.”
The Annals of Applied Statistics 4 (1): 266–98. https://doi.org/10.1214/09-AOAS285.
Chollet, François. 2018. Deep Learning with r / François Chollet
with j.j. Allaire. 1st edition. Shelter Island, NY: Manning
Publications.
Chollet, Francois, Tomasz Kalinowski, and J. J. Allaire. 2022. Deep
Learning with r. 2nd ed. Manning Publications.
Cohen, Jeremy M., Elan Rosenfeld, and J. Zico Kolter. 2019.
“Certified Adversarial Robustness via Randomized
Smoothing.” In International Conference on Machine Learning
(ICML).
Datta, Anupam, Shayak Sen, and Yair Zick. 2016. “Algorithmic
Transparency via Quantitative Input Influence: Theory and Experiments
with Learning Systems.” 2016 IEEE Symposium on Security and
Privacy (SP), May. https://doi.org/10.1109/sp.2016.42.
Domingos, Pedro, and Michael Pazzani. 1997. “On the Optimality of
the Simple Bayesian Classifier Under Zero-One Loss.” Machine
Learning 29 (2–3): 103–30.
Dutta, Praneet, Man Kit, Cheuk, Jonathan S Kim, and Massimo Mascaro.
2019. “AutoML for Contextual Bandits.” https://arxiv.org/abs/1909.03212.
Evgeniou, Theodoros, and Massimiliano Pontil. 2004. “Regularized
Multi-Task Learning.” In Proceedings of the ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining
(KDD).
Fedus, William, Barret Zoph, and Noam Shazeer. 2022. “Switch
Transformers: Scaling to Trillion Parameter Models with Simple and
Efficient Sparsity.” Journal of Machine Learning
Research 23 (120): 1–39.
Freund, Yoav, and Robert E Schapire. 1997. “A Decision-Theoretic
Generalization of On-Line Learning and an Application to
Boosting.” Journal of Computer and System Sciences 55
(1): 119–39. https://doi.org/10.1006/jcss.1997.1504.
Furnival, George M., and Robert W. Wilson. 1974. “Regressions by
Leaps and Bounds.” Technometrics 16 (4): 499–511. https://doi.org/10.1080/00401706.1974.10489231.
Ganin, Yaroslav, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo
Larochelle, Francois Laviolette, Mario Marchand, and Victor Lempitsky.
2016b. “Domain-Adversarial Training of Neural Networks.”
Journal of Machine Learning Research 17 (59): 1–35.
———. 2016a. “Domain-Adversarial Training of Neural
Networks.” Journal of Machine Learning Research 17 (59):
1–35.
Gaussier, Eric, and Cyril Goutte. 2005. “Relation Between PLSA and
NMF and Implications.” In Proceedings of the 28th Annual
International ACM SIGIR Conference, 601–2.
George, Edward I., and Robert E. McCulloch. 1993. “Variable
Selection via Gibbs Sampling.” Journal of the American
Statistical Association 88 (423): 881–89. https://doi.org/10.1080/01621459.1993.10476353.
Gibbs, Isaac, and Emmanuel J. Candes. 2021. “Adaptive Conformal
Inference Under Distribution Shift.” In Advances in Neural
Information Processing Systems (NeurIPS).
Glorot, Xavier, Antoine Bordes, and Yoshua Bengio. 2011. “Deep
Sparse Rectifier Neural Networks.” In Proceedings of the
Fourteenth International Conference on Artificial Intelligence and
Statistics (AISTATS), 15:315–23. Proceedings of Machine Learning
Research. PMLR. https://proceedings.mlr.press/v15/glorot11a.html.
Goodfellow, Ian J., Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David
Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014.
“Generative Adversarial Nets.” In Advances in Neural
Information Processing Systems (NeurIPS).
Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. 2015.
“Explaining and Harnessing Adversarial Examples.” In
International Conference on Learning Representations (ICLR).
Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep
Learning. MIT Press.
Griffiths, Thomas L., and Mark Steyvers. 2004. “Finding Scientific
Topics.” Proceedings of the National Academy of Sciences
101 (suppl 1): 5228–35.
Han, Jiawei, Jian Pei, and Yiwen Yin. 2000. “Mining Frequent
Patterns Without Candidate Generation.” In Proceedings of the
ACM SIGMOD International Conference on Management of Data, 1–12.
Hastie, Trevor, Jerome Friedman, and Robert Tibshirani. 2001. The
Elements of Statistical Learning. Springer New York. https://doi.org/10.1007/978-0-387-21606-5.
Hastie, Trevor, Andrea Montanari, Saharon Rosset, and Ryan J.
Tibshirani. 2022. “Surprises in High-Dimensional Ridgeless Least
Squares Interpolation.” The Annals of Statistics 50 (2):
949–86.
Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. 2006. “Reducing
the Dimensionality of Data with Neural Networks.”
Science 313 (5786): 504–7.
Hoffman, Matthew D., David M. Blei, and Francis Bach. 2010.
“Online Learning for Latent Dirichlet Allocation.” In
Advances in Neural Information Processing Systems (NeurIPS).
Hofmann, Thomas. 1999. “Probabilistic Latent Semantic
Indexing.” In Proceedings of the 22nd Annual International
ACM SIGIR Conference, 50–57.
Howard, Jeremy, and Sebastian Ruder. 2018. “Universal Language
Model Fine-Tuning for Text Classification.” In Proceedings of
the Annual Meeting of the Association for Computational Linguistics
(ACL).
Jacobs, Robert A., Michael I. Jordan, Steven J. Nowlan, and Geoffrey E.
Hinton. 1991. “Adaptive Mixtures of Local Experts.”
Neural Computation 3 (1): 79–87.
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani.
2013. “Statistical Learning.” In, 15–57. Springer New York.
https://doi.org/10.1007/978-1-4614-7138-7_2.
Joachims, Thorsten. 2002. “Optimizing Search Engines Using
Clickthrough Data.” In Proceedings of the Eighth ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining,
133–42.
Jordan, Michael I., and Robert A. Jacobs. 1994. “Hierarchical
Mixtures of Experts and the EM Algorithm.” Neural
Computation 6 (2): 181–214.
Kendall, Alex, Yarin Gal, and Roberto Cipolla. 2018. “Multi-Task
Learning Using Uncertainty to Weigh Losses for Scene Geometry and
Semantics.” In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (CVPR).
Kingma, Diederik P., and Max Welling. 2014. “Auto-Encoding
Variational Bayes.” In International Conference on Learning
Representations (ICLR).
Koenker, Roger. 2005. Quantile Regression. Econometric Society
Monographs 38. Cambridge University Press. https://doi.org/10.1017/CBO9780511754098.
Koikkalainen, Pasi. 1999. “Tree Structured Self-Organizing
Maps.” In, 121–30. Elsevier. https://doi.org/10.1016/b978-044450270-4/50009-7.
Kuhn, Max. 2014. “Futility Analysis in the Cross-Validation of
Machine Learning Models.” https://arxiv.org/abs/1405.6974.
Kumar, I Elizabeth, Suresh Venkatasubramanian, Carlos Scheidegger, and
Sorelle Friedler. 2020. “Problems with Shapley-Value-Based
Explanations as Feature Importance Measures.” In
International Conference on Machine Learning, 5491–5500. PMLR.
Kunzel, Soren R., Jasjeet S. Sekhon, Peter J. Bickel, and Bin Yu. 2019.
“Metalearners for Estimating Heterogeneous Treatment Effects Using
Machine Learning.” Proceedings of the National Academy of
Sciences 116 (10): 4156–65.
Landsman, Vardit, and Stefan Stremersch. 2020. “The Commercial
Consequences of Collective Layoffs: Close the Plant, Lose the
Brand?” Journal of Marketing 84 (3): 122–41. https://doi.org/10.1177/0022242919901277.
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. 2015. “Deep
Learning.” Nature 521 (7553): 436–44. https://doi.org/10.1038/nature14539.
Lee, Daniel D., and H. Sebastian Seung. 1999a. “Learning the Parts
of Objects by Non-Negative Matrix Factorization.” Nature
401 (6755): 788–91.
———. 1999b. “Learning the Parts of Objects by Non-Negative Matrix
Factorization.” Nature 401 (6755): 788–91. https://doi.org/10.1038/44565.
———. 2001. “Algorithms for Non-Negative Matrix
Factorization.” In Advances in Neural Information Processing
Systems (NeurIPS).
Lei, Jing, Max G’Sell, Alessandro Rinaldo, Ryan J. Tibshirani, and Larry
Wasserman. 2018. “Distribution-Free Predictive Inference for
Regression.” Journal of the American Statistical
Association 113 (523): 1094–1111.
Lei, Jing, and Larry Wasserman. 2014. “Distribution-Free
Prediction Bands for Non-Parametric Regression.” Journal of
the Royal Statistical Society: Series B 76 (1): 71–96.
Linde, Y., A. Buzo, and R. Gray. 1980. “An Algorithm for Vector
Quantizer Design.” IEEE Transactions on Communications
28 (1): 84–95. https://doi.org/10.1109/tcom.1980.1094577.
Lipovetsky, Stan, and Michael Conklin. 2001a. “Analysis of
Regression in Game Theory Approach.” Applied Stochastic
Models in Business and Industry 17 (4): 319–30. https://doi.org/10.1002/asmb.446.
Lipovetsky, Stan, and W.Michael Conklin. 2001b. “Multiobjective
Regression Modifications for Collinearity.” Computers and
Operations Research 28 (13): 1333–45. https://doi.org/10.1016/s0305-0548(00)00043-5.
Lundberg, Scott M, Gabriel G Erion, and Su-In Lee. 2018.
“Consistent Individualized Feature Attribution for Tree
Ensembles.” arXiv Preprint arXiv:1802.03888.
Lundberg, Scott M, and Su-In Lee. 2017. “A Unified Approach to
Interpreting Model Predictions.” In Proceedings of the 31st
International Conference on Neural Information Processing Systems,
4768–77.
MacDonald, Blake, Pritam Ranjan, and Hugh Chipman. 2015. “GPfit:
An R Package for Fitting a Gaussian Process Model to Deterministic
Simulator Outputs.” Journal of Statistical Software 64
(12). https://doi.org/10.18637/jss.v064.i12.
Machado, Marcos Roberto, Salma Karray, and Ivaldo Tributino de Sousa.
2019. “LightGBM: An Effective Decision Tree Gradient Boosting
Method to Predict Customer Loyalty in the Finance Industry.”
2019 14th International Conference on Computer Science and Education
(ICCSE), August. https://doi.org/10.1109/iccse.2019.8845529.
Madry, Aleksander, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras,
and Adrian Vladu. 2018. “Towards Deep Learning Models Resistant to
Adversarial Attacks.” In International Conference on Learning
Representations (ICLR).
Mahajan, Vijay, Subhash Sharma, and Robert D. Buzzell. 1993.
“Assessing the Impact of Competitive Entry on Market Expansion and
Incumbent Sales.” Journal of Marketing 57 (3): 39. https://doi.org/10.2307/1251853.
Merrick, Luke, and Ankur Taly. 2020. “The Explanation Game:
Explaining Machine Learning Models Using Shapley Values.” In
International Cross-Domain Conference for Machine Learning and
Knowledge Extraction, 17–38. Springer.
Mitchell, Margaret, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy
Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and
Timnit Gebru. 2019. “Model Cards for Model Reporting.” In
Proceedings of the Conference on Fairness, Accountability, and
Transparency. ACM. https://doi.org/10.1145/3287560.3287596.
Murphy, Kevin P. 2012. Machine Learning: A Probabilistic
Perspective. MIT Press.
Nakkiran, Preetum, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak,
and Ilya Sutskever. 2021. “Deep Double Descent: Where Bigger
Models and More Data Hurt.” Journal of Statistical Mechanics:
Theory and Experiment 2021 (12): 124003.
Nakkiran, Preetum, Prayaag Venkat, Sham Kakade, and Tengyu Ma. 2021.
“Optimal Regularization Can Mitigate Double Descent.” In
International Conference on Learning Representations (ICLR).
Ng, Andrew Y., and Michael I. Jordan. 2002. “On Discriminative Vs.
Generative Classifiers: A Comparison of Logistic Regression and Naive
Bayes.” In Advances in Neural Information
Processing Systems (NIPS), 14:841–48.
Nie, Xinkun, and Stefan Wager. 2021. “Quasi-Oracle Estimation of
Heterogeneous Treatment Effects.” Biometrika 108 (2):
299–319.
Pan, Sinno Jialin, and Qiang Yang. 2010. “A Survey on Transfer
Learning.” IEEE Transactions on Knowledge and Data
Engineering 22 (10): 1345–59.
Rabiner, Lawrence R. 1989. “A Tutorial on Hidden Markov Models and
Selected Applications in Speech Recognition.” Proceedings of
the IEEE 77 (2): 257–86.
Radcliffe, Nicholas J. 2007. “Using Control Groups to Target on
Predicted Lift: Building and Assessing Uplift Models.” Direct
Marketing Analytics Journal, 14–21.
Rasmussen, Carl Edward, and Christopher K. I. Williams. 2005.
Gaussian Processes for Machine Learning. The MIT Press. https://doi.org/10.7551/mitpress/3206.001.0001.
Ribeiro, Marco, Sameer Singh, and Carlos Guestrin. 2016.
““Why Should i Trust You?”: Explaining
the Predictions of Any Classifier.” Proceedings of the 2016
Conference of the North American Chapter of the Association for
Computational Linguistics: Demonstrations. https://doi.org/10.18653/v1/n16-3020.
Ročková, Veronika, and Stéphanie van der Pas. 2020. “Posterior
Concentration for Bayesian Regression Trees and
Forests.” The Annals of Statistics 48 (4): 2108–31. https://doi.org/10.1214/19-AOS1879.
Romano, Yaniv, Evan Patterson, and Emmanuel J. Candes. 2019.
“Conformalized Quantile Regression.” In Advances in
Neural Information Processing Systems (NeurIPS).
Roweis, Sam T., and Lawrence K. Saul. 2000. “Nonlinear
Dimensionality Reduction by Locally Linear Embedding.”
Science 290 (5500): 2323–26. https://doi.org/10.1126/science.290.5500.2323.
Ruder, Sebastian. 2017. “An Overview of Multi-Task Learning in
Deep Neural Networks.” arXiv Preprint arXiv:1706.05098.
Samek, Wojciech, Gregoire Montavon, Sebastian Lapuschkin, Christopher J.
Anders, and Klaus-Robert Muller. 2021. “Explaining Deep Neural
Networks and Beyond: A Review of Methods and Applications.”
Proceedings of the IEEE 109 (3): 247–78. https://doi.org/10.1109/jproc.2021.3060483.
Shazeer, Noam, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc
Le, Geoffrey Hinton, and Jeff Dean. 2017. “Outrageously Large
Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer.” In
International Conference on Learning Representations (ICLR).
Sparapani, Rodney, Charles Spanbauer, and Robert McCulloch. 2021.
“Nonparametric Machine Learning and Efficient Computation with
Bayesian Additive Regression Trees: The BART R Package.”
Journal of Statistical Software 97 (1). https://doi.org/10.18637/jss.v097.i01.
Štrumbelj, Erik, and Igor Kononenko. 2013. “Explaining Prediction
Models and Individual Predictions with Feature Contributions.”
Knowledge and Information Systems 41 (3): 647–65. https://doi.org/10.1007/s10115-013-0679-x.
Sugiyama, Masashi, Matthias Krauledat, and Klaus-Robert Muller. 2007.
“Covariate Shift Adaptation by Importance Weighted Cross
Validation.” Journal of Machine Learning Research 8:
985–1005.
Sundararajan, Mukund, and Amir Najmi. 2020. “The Many Shapley
Values for Model Explanation.” In International Conference on
Machine Learning, 9269–78. PMLR.
Szegedy, Christian, Wojciech Zaremba, Ilya Sutskever, Joan Bruna,
Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2014. “Intriguing
Properties of Neural Networks.” In International Conference
on Learning Representations (ICLR).
Tenenbaum, Joshua B., Vin de Silva, and John C. Langford. 2000. “A
Global Geometric Framework for Nonlinear Dimensionality
Reduction.” Science 290 (5500): 2319–23. https://doi.org/10.1126/science.290.5500.2319.
Tibshirani, Robert. 1996. “Regression Shrinkage and Selection Via
the Lasso.” Journal of the Royal Statistical Society: Series
B (Methodological) 58 (1): 267–88. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x.
Tibshirani, Ryan J., Rina Foygel Barber, Emmanuel J. Candes, and Aaditya
Ramdas. 2019. “Conformal Prediction Under Covariate Shift.”
In Advances in Neural Information Processing Systems (NeurIPS).
Tsipras, Dimitris, Shibani Santurkar, Logan Engstrom, Alexander Turner,
and Aleksander Madry. 2019. “Robustness May Be at Odds with
Accuracy.” In International Conference on Learning
Representations (ICLR).
Vincent, Pascal. 2011. “A Connection Between Score Matching and
Denoising Autoencoders.” Neural Computation 23 (7):
1661–74.
Vovk, Vladimir. 2012. “Conditional Validity of Inductive Conformal
Predictors.” In Proceedings of the Asian Conference on
Machine Learning (ACML).
Vovk, Vladimir, Alexander Gammerman, and Glenn Shafer. 2005.
Algorithmic Learning in a Random World. Springer.
Wager, Stefan, and Susan Athey. 2018. “Estimation and Inference of
Heterogeneous Treatment Effects Using Random Forests.”
Journal of the American Statistical Association 113 (523):
1228–42.
Wikle, Christopher K., Andrew Zammit-Mangion, and Noel Cressie. 2019.
“Introduction to Spatio-Temporal Statistics.” In, 1–16.
Chapman; Hall/CRC. https://doi.org/10.1201/9781351769723-1.
Wolpert, David H. 1992. “Stacked Generalization.”
Neural Networks 5 (2): 241–59.
Xia, Fen, Tie-Yan Liu, Jue Wang, Wensheng Zhang, and Hang Li. 2008.
“Listwise Approach to Learning to Rank: Theory and
Algorithm.” In Proceedings of the 25th International
Conference on Machine Learning, 1192–99.
Yosinski, Jason, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014.
“How Transferable Are Features in Deep Neural Networks?” In
Advances in Neural Information Processing Systems (NeurIPS).
Yu, Keming, Zudi Lu, and Julian Stander. 2003. “Quantile
Regression: Applications and Current Research Areas.” Journal
of the Royal Statistical Society: Series D (The Statistician) 52
(3): 331–50. https://www.jstor.org/stable/4128208.
Zhang, Yu, and Qiang Yang. 2021. “A Survey on Multi-Task
Learning.” IEEE Transactions on Knowledge and Data
Engineering 34 (12): 5586–5609.
Zhong, Guoqiang, and Guohua Yue. 2020. “Attention Recurrent Neural
Networks for Image-Based Sequence Text Recognition.” In, 793–806.
Springer International Publishing. https://doi.org/10.1007/978-3-030-41404-7_56.