SourceCodeAI Archives - Techno Blender

SourceCodeAI — how to handle Train-Inference mismatch | by Ori Abramovsky | May, 2022

Jessie Hobb May 25, 2022 0

Photo by Alex Dumitru from PexelsSource code AI has many unique features which differentiate it from the more general NLP applications (like the common practice to heavily process the input prior to feeding it to the model). One of it’s main challenges is the fact that while generating source code train datasets seems quite easy (‘just crawl Github’), the reality includes many hidden pitfalls to avoid, between is the fact that such highly available sources (like Github, Bitbucket or Stackoverflow) commonly differ from the…