Feature Selection with Distance Correlation
- Link:
- Autor/in:
- Erscheinungsjahr:
- 2024
- Medientyp:
- Text
- Schlagworte:
-
- High Energy Physics - Phenomenology
- Computer Science - Machine Learning
- High Energy Physics - Experiment
- Physics - Data Analysis
- Statistics and Probability
- Beschreibung:
-
Choosing which properties of the data to use as input to multivariate decision algorithms - also known as feature selection - is an important step in solving any problem with machine learning. While there is a clear trend towards training sophisticated deep networks on large numbers of relatively unprocessed inputs (so-called automated feature engineering), for many tasks in physics, sets of theoretically well-motivated and well-understood features already exist. Working with such features can bring many benefits, including greater interpretability, reduced training and run time, and enhanced stability and robustness. We develop a new feature selection method based on distance correlation, and demonstrate its effectiveness on the tasks of boosted top- and W-tagging. Using our method to select features from a set of over 7,000 energy flow polynomials, we show that we can match the performance of much deeper architectures, by using only ten features and two orders-of-magnitude fewer model parameters.
- Lizenz:
-
- info:eu-repo/semantics/openAccess
- Quellsystem:
- Forschungsinformationssystem der UHH
Interne Metadaten
- Quelldatensatz
- oai:www.edit.fis.uni-hamburg.de:publications/2d83ca91-f70d-455c-917d-44b12ce75f8d