AI Tools Illegally Training on Real Images of Children

5 min read

AI technology is revolutionizing various sectors, bringing incredible advancements and convenient tools. However, a dark side has emerged: AI tools are training on real images of children without their consent. A recent report by Human Rights Watch exposes this troubling reality, revealing how a popular AI training dataset is exploiting the faces of Brazilian children.

Unveiling the Unethical Practice

Human Rights Watch reported that over 170 images and personal details of Brazilian children have been scraped and used to train AI models without their knowledge or consent. These images were sourced from an open-source dataset and included content posted as recently as 2023, and even as far back as the mid-1990s. Such data collection practices raise severe concerns about privacy and consent.

The Culprit: LAION-5B Dataset

This dataset, known as LAION-5B, has been developed by the German nonprofit organization LAION. Built on data scraped from Common Crawl—a vast repository of web-scraped data—the LAION-5B dataset contains over 5.85 billion image-caption pairs. Unfortunately, it has become a popular source for training various AI models, including Stability AI’s Stable Diffusion image generation tool.

Violation of Privacy

Hye Jung Han, a children’s rights and technology researcher at Human Rights Watch, discovered the images and expressed her concerns. She said, “Their privacy is violated in the first instance when their photo is scraped and swept into these datasets. AI tools trained on this data can create realistic imagery of children, enabling malicious actors to manipulate them however they want.”

Sources of Images

Many of the exploited images were sourced from mommy blogs, personal maternity and parenting blogs, and YouTube videos featuring children. Most of these images were posted with an expectation of privacy, shared within small circles of family and friends. Yet, they were scrapped and used for AI training, raising serious ethical questions.

Response and Actions

LAION has acknowledged the issue and confirmed to Human Rights Watch that the identified images existed in their dataset. The organization has taken steps to remove these images in collaboration with bodies like the Internet Watch Foundation, the Canadian Centre for Child Protection, and Stanford University. LAION’s spokesperson, Nate Tyler, stated, “We’re working to remove all known references to illegal content from the dataset.”

Legal and Ethical Implications

This unethical practice has caught the attention of various stakeholders, including YouTube. The platform’s spokesperson, Jack Maon, reiterated that scraping YouTube content without authorization violates their Terms of Service. YouTube is actively taking action against such abuses.

However, the issue is far from resolved. In December, Stanford University researchers found that LAION-5B contained child sexual abuse material. This highlights the potential dangers of using children’s images not just for AI training but also for creating harmful content like deepfakes.

The Larger Problem

Hye Jung Han expressed her concern about the broader implications. Beyond the immediate risk of generating harmful content, there is a possibility that other sensitive details like locations or medical data could be exposed. “Children should not have to live in fear that their photos might be stolen and weaponized against them,” she emphasized.

A Global Issue

What Hye found is just a fraction of the problem. Her research looked at a tiny slice of the dataset, suggesting that similar issues might exist worldwide. It’s not just Brazilian children at risk; children globally could find their images and personal data misused in this way.

Existing Images: A Continuous Threat

Despite efforts to remove links from the LAION dataset, the underlying content still exists on the internet. The problem, therefore, is not entirely solved by deleting these images from one database. LAION’s Tyler pointed out, “This is a larger and very concerning issue, and as a nonprofit volunteer organization, we will do our part to help.”

Need for Government Intervention

Hye Jung Han believes that the responsibility to protect children shouldn’t fall on them or their parents. Governments and regulatory bodies need to take charge. In Brazil, lawmakers are considering regulations to control deepfake creation. In the US, Representative Alexandria Ocasio-Cortez has proposed the DEFIANCE Act, which would allow victims of non-consensual deepfakes to sue.

Protecting Our Children

Technology companies and regulatory bodies are responsible for protecting children’s digital privacy and security. It’s essential to establish robust legal frameworks to prevent such unethical use of data. While parents can adopt caution in posting children’s images online, the ultimate responsibility lies with regulators and tech companies to ensure safe and ethical AI practices.

Conclusion

The recent findings by Human Rights Watch on the unethical practices surrounding AI training datasets raise critical concerns. As AI continues to evolve, so too must our vigilance and regulatory frameworks. By collaborating and implementing strong legal protections, we can ensure that AI technology develops responsibly, safeguarding the privacy and rights of individuals, especially vulnerable children.

+ There are no comments

Add yours