I invested some time learning about the hdf5 data format to upload some dataset to mldata.org . Sadly the documentation of mldata.org is very sparse and in case of the used hdf5 data format, outdated too. I have been in contact with the maintainer but I still don't have working specs. I decided to put this task at rest for the moment.
I have a Python prototype working for the new coordinate descent algorithm that I'm now about to integrate step by step into scikit-learn .
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def enet_coordinate_descent2(w, l2_reg, l1_reg, X, y, max_iter): | |
n_samples = X.shape[0] | |
n_features = X.shape[1] | |
norm_cols_X = (X ** 2).sum(axis=0) | |
Xy = np.dot(X.T,y) | |
gradient = np.zeros(n_features) | |
feature_inner_product = np.zeros(shape=(n_features, n_features)) | |
active_set = set(range(n_features)) | |
#debug | |
value_enet_f = 0 | |
for n_iter in range(max_iter): | |
for ii in active_set: | |
w_ii = w[ii] | |
# initial calculation | |
if n_iter == 0: | |
feature_inner_product[:, ii] = np.dot(X[:, ii], X) | |
gradient[ii] = Xy[ii] - np.dot(feature_inner_product[:, ii], w) | |
tmp = gradient[ii] + w_ii * norm_cols_X[ii] | |
w[ii] = fsign(tmp) * max(abs(tmp) - l2_reg, 0) \ | |
/ (norm_cols_X[ii] + l1_reg) | |
# update gradients, if coef changed | |
if w_ii != w[ii]: | |
for j in active_set: | |
if n_iter >= 1 or j <= ii: | |
gradient[j] -= feature_inner_product[ii, j] * \ | |
(w[ii] - w_ii) | |
# debug | |
#value_enet_f = check_convergence(y, X, w, value_enet_f) | |
#print value_enet_f | |
#remove inactive features | |
tmp_s = set.copy(active_set) | |
for j in tmp_s: | |
if w[j] == 0: | |
active_set.remove(j) | |
return w |
This version will be written in Cython to speed thins up. I hope that I can beat the execution time of the current implementation soon.
Keine Kommentare:
Kommentar veröffentlichen