Building the first AI that understands code

We apply neural networks to source code from over 17 million software repositories and 6.6 million developers worldwide

Where we are today

  • Fetched and analyzed every git repository on GitHub, BitBucket, self-hosted cgit.
  • Trained neural networks to extract relevant features from code.
  • Implemented most git features in our open source go-git.
  • Currently training neural networks on natural language use.
  • Built the world's fastest WeightedMinHash and k-means clustering implementations.
  • Shown that RNNs with memory can learn from projects and its limitations.

What is next?

  • Launch a platform where developers find the right teams and projects to join.
  • Fetching and processing any change to public git repos close to real-time.
  • Finalizing our first trained models on all 6.6 M developers.
  • Extend TensorFlow with new loss functions and apply to code style.
  • Moving infrastructure to bare metal servers with 1.4 PB storage running GPUs.
  • Reaching full feature parity on go-git with libgit2.