Encoding high-cardinality string categorical variables
Abstract: One-hot encoding of categorical variables is often needed in statistical models. High-dimensional feature vectors make this strategy fail as categories increase. One-hot encoding also lacks morphological information for string entries. High-cardinality string categorical variables…