Public 46th / Private 34th Solution

Method

pre-processing

  • Add stat feature

  • Add PCA feature

  • Variance Threshold

    • Feature selector that removes all low-variance features.

  • Rankgauss

    • Assign a spacing between -1 and 1 to the sorted features

    • Apply inverse error function → makes a gaussian distribution

modeling

  • Label Smoothing

  • Transfer Learning by nonscored for NN, ResNet

  • Shallow model

    • Short epoch, learning until limit before loss is NaN by NN

    • n_steps=1, n_shared=1 by TabNet

  • Thresholdlng NN: input → linear → tanh → NN

post-processing

  • Ensemble. In particular, Tabnet and NN's ensemble is effective.

  • 2 Stage Stacking by MLP, 1D-CNN, Weight Optimization

What doesn't work

  • pre-processing

  • modeling

  • post-processing

Code Structure

  1. Dataset Structure:

  • Model Weights

  • Inference codes for each models

  • Python Packages

2. Install python packages → Inference stage1 models → get the predictions of each models

3. Stacking (MLP, 1D CNN, Weight Optimization)

  • Not enough time

    • Target Encoding to g-,c- bin's feature

    • XGBoost, CatBoost, CNN model for single model (Stage 1)

    • GCN model for stacking model (Stage 2)

    • Netflix Blending

    • PostPredict by LGBMWe noticed that there are columns that NN can't predict, but LGBM can (e.g. cyclooxygenase_inhibitor). Therefore, we came up with the idea of repredicting only the columns that are good at LGBM. But not enough time.

Takeover

  • Clean Inference Code

    • Ensemble using various different models

    • Stacking

[Update] Private 3rd Rank with Various Stacking

0.01608 → 0.01599

2D-CNN Stacking

GCN Stacking

  • Adjacency Matrix: Matrix of ones / (# of classes)^2

  • Node: (1, 5)

Last updated

Was this helpful?