GSoC 2022 call

Last updated on 09 Feb 2022 4 min read

List of Suggested Projects for GSoC 2022

This is a list of projects suggested by Catalyst-Team for GSoC 2022. GSoC has focused on 2 types of projects this year: ~175 hours or ~350 hours to complete, the list below contains project suggestions that should meet these criteria depending also on the skill level of the participant. These are only suggestions, and if you have your own ideas, please discuss them on the Catalyst Team Slack. We have a lot of talented ML practitioners in the Catalyst Team who would love to mentor good students for GSoC 2022.

Meta tracks: examples, documentation, tests
LR Finder API
Poetry
Metric Learning metrics and benchmark
Self-Supervised Learning benchmark extension
RecSys sequential recommenders
Off-policy RL
On-policy RL

See lower down on this page for more details for some of the projects listed above.

Timeline

The timeline for GSoC 2022 is here.

How to improve your chances of being accepted

When making the difficult decision about which students to accept, we look for:

Clear and detailed application explaining how you think the project could be done
Relevant prior experience - machine learning, deep learning, pytorch
Experience contributing to Catalyst or other open source ML/DL projects
Understanding of Git and/or GitHub

Meta tracks: examples, documentation, tests

Like every open-source framework, Catalyst is actively interested in examples and documentation improvements to improve its adoption. In such a case during any listed projects below you are welcome to extend our documentation site with FAQ of your own, or Keras-like examples.

LR Finder API

LR finder is a well-known technique in the ML community to tune your initial learning rate for maximal performance. While Catalyst already has its implementation with some extensions there is a place for improvement in terms of user API.

The goal of the project

revisit LR finder implementation and add additional documentation if required
extend Runner API with find_lr method (like runner.train or runner.predict_loader ones)
benchmark LR finder on available examples
write a blog post with your findings for the community

Poetry

The main idea of this project is to transfer Catalyst from pip to a poetry-based development environment. It includes working with installations scripts, tests, and contributions guides. See RP#1358 for more information.

Metric Learning metrics and benchmark

Catalyst Team was always inspired with the Metric Learning. For such a reason, we also have a wide variety of ML-based metrics: CMCMetric, ReidCMCMetric.

The goal of this project

extend the available metrics with Normalized Mutual Information (NMI)
implement and benchmark Mean Average Precision at R
write a blog post with your findings for the community

Self-Supervised Learning benchmark extension

In addition to Metric Learning, Catalyst also has a Self-Supervised Learning benchmark. Nevertheless, it could be improved a bit.

The goal of this project

extend the catalyst.contrib.datasets with the ImageNet1k
add ImageNet1k support to the current SSL benchmark
add video-based dataset support to the SSL benchmark
revisit the final results
write a blog post with your findings for the community

RecSys sequential recommenders

There are many RecSys-based examples in the Catalyst already. One thing to mention - they are mainly AutoEncoders-based. For such a reason we are interested in its adoption for a wide variety of sequential recommenders also.

The goal of this project

implement a synthetic data generator for sequential RecSys recommenders
add RecSysDataset support to the catalyst.contrib.data
implement SAS4Rec recommender
implement BERT4Rec recommender
implement RepeatNet recommender
compare available Catalyst recommenders
write a blog post with your findings for the community

Off-policy RL

While Catalyst already has a few off-policy RL examples, several improvements can be made.

The goal of this project

extend available RL examples with n-step returns and categorical/quantile loss
implement TD3 algorithm
implement SAC algorithm
write a blog post with your findings for the community

On-policy RL

While Catalyst already has a few on-policy RL examples, several improvements can be made.

The goal of this project

improve current implementation with BatchEnv wrapper
implement TRPO algorithm
implement PPO algorithm
write a blog post with your findings for the community

GSoC 2022 call

List of Suggested Projects for GSoC 2022

Timeline

How to improve your chances of being accepted

Meta tracks: examples, documentation, tests

LR Finder API

Poetry

Metric Learning metrics and benchmark

Self-Supervised Learning benchmark extension

RecSys sequential recommenders

Off-policy RL

On-policy RL

Sergey Kolesnikov