T
23

I got called out for my sloppy training data back in 2020

A senior dev looked at my text classification model and said, 'Your labels are a mess, you're just teaching it your own bad habits.' He was right. I was rushing and had inconsistent categories like 'customer complaint' and 'angry email' for the same thing. I spent the next month cleaning 50,000 entries by hand before retraining. Anyone else have a brutal code review that actually improved your process?
2 comments

Log in to join the discussion

Log In
2 Comments
keith943
keith9434d ago
Remember it's not just about being careful from the start. You gotta build checks into the process itself, like a quick label check before you even start training.
6
the_andrew
Man, what did you think about labels before that happened? I used to see them as just a quick step to get to the fun coding part... but a review like that totally flipped it for me. It showed me that messy data isn't just a small problem, it's the whole foundation. Now I'm way more careful from the start, even if it takes longer.
3