GSOC 2012 and scikit-learn

Refactoring linear model code and first benchmark plots

During the last week, I spend some time refactoring the Elastic Net and Lasso classes in scikit-learn. The goal is to obtain a unique interface for the sparse and dense implementations of this model. The format of the input data is used to determine, whether the sparse implementation is used or not. This has the advantage that the user can use the same class for sparse and dense data.

The first benchmark reports are available thanks to Vlads work to adapt vbench to work with scikit-learn ( check out his blog post ).

The plot shows how the execution time changed over time during the past 180 days, for fitting an elastic-net penalized logistic regression on the leukemia data set.

Scikit-learn currently wraps the LIBLINEAR library to fit logistic regression models. As a comparison, the implementation of glmnet 1.7.4 using the R package glmnet took 188 milliseconds to fit the leukemia data set. Next week will bring more benchmark evaluations on different data sets.

GSOC 2012 and scikit-learn

Sonntag, 10. Juni 2012

Keine Kommentare:

Kommentar veröffentlichen