I really enjoy academic conferences and meetups. There's just an overwhelmingly positive attitude that's hard to replicate elsewhere. People are genuinely interested in you and what you have going on. No one is there to sell you a service or promote their latest developer tool. People are there to exchange ideas.
That's why I was really excited when the Danish ML/AI academics I follow on Bluesky started promoting a symposium on NLP in Copenhagen. There were speakers from places whose work I deeply respect such as HuggingFace, Cohere and the Allen Institute. I quickly registered my interest and blocked my work calendar.
My takeaways from the talks were, - Evaluation of LLMs continues to be a hard problem - Small language models are increasingly competitive - High-quality data is hard to obtain and use at scale I sort of get the feeling that this is what I expected and wanted to hear and that my excitement comes from being confirmed by industry and academic experts.
My favourite moment from the talks was by Kyle Ro. He shared some peculiar spikes in the loss during model training. It turned out that at least one of the spikes was caused by data from the microwavegang subreddit where they posts consists of "Mmmmmmmmmmmm" which resulted in a high loss. It was nice to hear about his pragmatic approach of just looking at the data and fixing it with a simple filter.
My appreciation for Andrej Karpathy's teaching started while I did my master's thesis. His Neural Networks: Zero to Hero series really helped me get an intuitive understanding of some fundamental concepts in deep learning.
What excites me a lot about Karpathy is that his name has gotten associated with quality to the point that the directors and senior managers at my company is listening to him. With so many high-noise inputs, especially at that layer, it's really nice to see that Andrej's work is cutting through the noise.
Our Director of XR Products had this to say about the presentation: "There were lots of good points in this talk. The one that struck out to me, was this mental model of how software development is shifting to what he calls Software 3.0: A software development model where neural networks (typically deep learning models) are trained on large datasets to learn patterns, decision logic, and behavior—rather than being explicitly programmed line-by-line by developers. So it basically allowed Tesla's automated self-driving software to learn much faster from data sets, than what could be constructed by Software 1.0 hard-coded logic (C, Java, Python, etc.) and Software 2.0 (machine learning methods with decision trees).
AND - That while Software 3.0 is expanding, as in the picture below, to ensure that we can control and shape what we want the software and application to do, he is stating that we actually need a blend of Software 1.0, 2.0 and 3.0 potentially all working together to produce the right result. I think that mental model of the combination of the three, working in concert, was important for me to remember."
Propagating ideas and mental models through an organization can be difficult. Having resources such as talks like this by Andrej will make it easier to have a shared vocabulary about some pretty abstract ideas.