Projects of GSoC’18 were announced last week and I am happy that I will be working on Shogun over this summer. In this post, I am pleased to share some details of my project.
Shogun is a versatile machine learning toolbox that offers a bunch of convinent and unified algorithm implementations. Written in C++ and providing interface in different languages via SWIG, it is a great place to hack.
My project, Continuous Detoxification, will focus on redesigning and refactoring, aiming at modernizing Shogun’s codebase and improving experience for both users and developers. As a project lasting for more than ten years, some parts of Shogun are old and not well-designed. For example, parallel cross validation is not thread safe because features are modified in some algorithms and thus cannot be shared. So, as the first part of my project, I would like to make features immutable. There are several blockers that need to be addressed first. Algorithms that rely on mutation of features will be refactored and preprocessors which are coupled with features need to be redesigned as well. Besides, views of features and labels will be introduced to enable sharing underlying data between features or labels. Some other parts will be taken care as well for thread safety. After we have immutable features, we can safely enable parallel cross validation.
Un-templated matrix and vector are another part, meaning that we will have Matrix and Vector instread of SGMatrix
To share more details about my project with wider audience outside the community, I will keep myself updated in the following weeks here in this blog as we go on.