GSoC 2022 call

List of Suggested Projects for GSoC 2022

This is a list of projects suggested by Catalyst-Team for GSoC 2022. GSoC has focused on 2 types of projects this year: ~175 hours or ~350 hours to complete, the list below contains project suggestions that should meet these criteria depending also on the skill level of the participant. These are only suggestions, and if you have your own ideas, please discuss them on the Catalyst Team Slack. We have a lot of talented ML practitioners in the Catalyst Team who would love to mentor good students for GSoC 2022.

  • Meta tracks: examples, documentation, tests
  • LR Finder API
  • Poetry
  • Metric Learning metrics and benchmark
  • Self-Supervised Learning benchmark extension
  • RecSys sequential recommenders
  • Off-policy RL
  • On-policy RL

See lower down on this page for more details for some of the projects listed above.

Timeline

The timeline for GSoC 2022 is here.

How to improve your chances of being accepted

When making the difficult decision about which students to accept, we look for:

  • Clear and detailed application explaining how you think the project could be done
  • Relevant prior experience - machine learning, deep learning, pytorch
  • Experience contributing to Catalyst or other open source ML/DL projects
  • Understanding of Git and/or GitHub

Meta tracks: examples, documentation, tests

Like every open-source framework, Catalyst is actively interested in examples and documentation improvements to improve its adoption. In such a case during any listed projects below you are welcome to extend our documentation site with FAQ of your own, or Keras-like examples.

LR Finder API

LR finder is a well-known technique in the ML community to tune your initial learning rate for maximal performance. While Catalyst already has its implementation with some extensions there is a place for improvement in terms of user API.

The goal of the project

  • revisit LR finder implementation and add additional documentation if required
  • extend Runner API with find_lr method (like runner.train or runner.predict_loader ones)
  • benchmark LR finder on available examples
  • write a blog post with your findings for the community

Poetry

The main idea of this project is to transfer Catalyst from pip to a poetry-based development environment. It includes working with installations scripts, tests, and contributions guides. See RP#1358 for more information.

Metric Learning metrics and benchmark

Catalyst Team was always inspired with the Metric Learning. For such a reason, we also have a wide variety of ML-based metrics: CMCMetric, ReidCMCMetric.

The goal of this project

  • extend the available metrics with Normalized Mutual Information (NMI)
  • implement and benchmark Mean Average Precision at R
  • write a blog post with your findings for the community

Self-Supervised Learning benchmark extension

In addition to Metric Learning, Catalyst also has a Self-Supervised Learning benchmark. Nevertheless, it could be improved a bit.

The goal of this project

  • extend the catalyst.contrib.datasets with the ImageNet1k
  • add ImageNet1k support to the current SSL benchmark
  • add video-based dataset support to the SSL benchmark
  • revisit the final results
  • write a blog post with your findings for the community

RecSys sequential recommenders

There are many RecSys-based examples in the Catalyst already. One thing to mention - they are mainly AutoEncoders-based. For such a reason we are interested in its adoption for a wide variety of sequential recommenders also.

The goal of this project

  • implement a synthetic data generator for sequential RecSys recommenders
  • add RecSysDataset support to the catalyst.contrib.data
  • implement SAS4Rec recommender
  • implement BERT4Rec recommender
  • implement RepeatNet recommender
  • compare available Catalyst recommenders
  • write a blog post with your findings for the community

Off-policy RL

While Catalyst already has a few off-policy RL examples, several improvements can be made.

The goal of this project

  • extend available RL examples with n-step returns and categorical/quantile loss
  • implement TD3 algorithm
  • implement SAC algorithm
  • write a blog post with your findings for the community

On-policy RL

While Catalyst already has a few on-policy RL examples, several improvements can be made.

The goal of this project

  • improve current implementation with BatchEnv wrapper
  • implement TRPO algorithm
  • implement PPO algorithm
  • write a blog post with your findings for the community