Trigger Warning: This report discusses trauma, mental illness, addiction, and eating disorders. There are mentions of suicide and self-harm and descriptions of graphic content, violence, explicit sexual situations, child sexual abuse, ethnic cleansing, and animal cruelty.

I am fully outraged at what I’ve just read. This is a short commentary more than it is a blog post (I promise the capitalism one is on the way) but I write this to emphasize how much it disgusts me that this is the current state of our intellectual movement. I hope that this spurs action from all of us in some way regardless of whether we’re involved with the industry or not.

summary#

OpenAI, Perplexity, Anthropic, xAI, and many more “AI companies” are, at their core, big data companies. There is very little proprietary about their fundamental algorithms, but what does set them apart is their ability to process and refine data. Their prompting pipelines for LLMs and their censoring infrastructure is what allows them to be so dominant as companies and create the infrastructure that they have surrounding their foundation models. I used to think that they were just really good at the prompting sciences that they uncovered, that they had real proprietary fundamental gains as a result of unique and innovative design choices.

However, I just read an investigation (and disappeared down a 2-hour rabbit hole) on the nature of these “data pipelines”. Specifically, I read the story of Fasica Berhane Gebrekidan, a “data translator” hired by a nondescript company for “translation work”. Fasica’s job was simple: watch 9 hours of harrowing content of sexual abuse, violence, animal cruelty, and many more I won’t describe in detail—and annotate it. Every single day, she’d wake up, and annotate some of the most life-scarring footage and content she’d ever read in her life—for $1.50. Today, Fasica cries instead of conversing. Life itself is too painful for her to interact with the world, and everything serves as a constant reminder of the thousands of horrors she’s basically witnessed firsthand.

Fasica is not alone: millions in impoverished countries are offered some “data translation” as a job and are set to watch recordings of various simulated and recorded interactions. They work around the clock, spending 9 hours watching the data and then another 9 hours labeling it. And most make barely enough to live each day or have their paychecks withheld based on the work that they output, which is a continuously changing in standard. They develop various disorders from being exposed to these harmful forms of content daily and unendingly.

Various nonprofit organizations and investigation agencies have recently brought this to light. In the past month, countless investigation results have finally been published highlighting the advantages that these large corporations take of these third-world countries. Largely due to currency imbalances and the strength of USD, shell companies with bespoke and ostensibly valid roles such as “data translator” and “content moderator” are just facades for transforming large communities into living, breathing computers that are run just like data-centers.

“The great machines are filled with ghosts and whispers.”

speechless#

Historically, I’ve never really supported or been against these large AI corps. I regarded them with general distrust but had no choice but to acknowledge their relevance and creativity in bringing forth the study of artificial intelligence as a priority to the world. But this is not some “beauty of innovation moment” - no, this is nothing short of disgusting and abominable. It makes me extremely grateful to be involved in research and innovate for pushes in progress that aren’t just “clean the data” or “buy a bigger datacenter”. I’m iterating on fundamental attributes on human intelligence, and using math and information theory to bring those attributes to algorithms for safe and interpretable learning. The work that I’m doing is different, and I hope it is for the better.

If we as a society want to produce a culture of innovation that we are proud of, we cannot regress to exploiting human nature for this progress.

We cannot just take a stand by ignoring the use of these products. At the top level, every large corporation is going to become LLM- and gen-AI-integrated. What we can do is promote the research of thoughtfully and intentionally created AI that furthers through development of the fundamental algorithms behind it. It’s why I work in cognitive science, and it’s why I eventually hope to be interested in politics. We need to bring this issue to legislation, so the work of companies can be monitored and their outputs can be optimized. We need to stop placing emphasis on progress for capital gain and start emphasizing progress that promotes humanitarian growth without a dehumanizing process. We need to champion training pipelines that allow us as the consumer to put in the work we want to reap, so that we can be as involved in the creation of this AI as the corporations are. We need transparency. And above all else, we need justice for the communities wounded by these companies.

I donated to data-workers.org and I highly encourage you to find some way to donate to similar causes or be involved with your legistlative group. I’m still figuring out where my place is in all of this. Though my AI-ideologies don’t inherently align with the ideologies of those championing these data pipelines, I can’t help but be worried about whether my work will inherently contribute to these stigmas as well.

Perhaps that worry is a good thing, though. If you are reading this and you are using AI in your day-to-day life, you shouldn’t necessarily feel guilty. You definitely shouldn’t feel guilty if you’re passionate about studying artificial intelligence and innovating on the standards we currently have. But don’t ever stop thinking about how you are developing your career for the good of humanity. Keep that focus on community at the center of every single thing you work on.

It’s up to us to think deeply and intentionally not just about applications, but about theory that allows for kinder applications where everyone can reap the rewards. We have to care about one another, otherwise, in another couple decades, we may tear each other apart in the race for singularity and AGI.

-Karthik