Monday, June 17, 2024
- Advertisement -

    Latest Posts

    Human Rights Watch Flags Images of Children in AI Training Datasets

    The personal photos of Brazilian children are being used in AI training datasets without their consent, found an investigation by Human Rights Watch (HRW). The images were found in LAION-5B, an open-source dataset used by popular text-to-image AI models like Stable Diffusion. 

    HRW found at least 170 photos across 10 Brazilian states, presumably posted on private blogs and social media sites and scraped off the web, many of which contained personally identifiable information about the children, like names and locations.

    The report reviewed less than 0.0001% of the entire dataset, suggesting that the true number may be higher. The report describes the images as portraying intimate family moments like childbirth, birthday parties, or school classrooms.

    Potential Risks

    HRW pointed out the risk that AI models trained on the dataset can be used to generate identical copies of the images, thereby infringing on their privacy and safety. Furthermore, the models can also be used to create deep fakes that show them and even Child Sexual Abuse Material (CSAM). 

    The organisation argued, “Once their data is swept up and fed into AI systems, these children face further threats to their privacy due to flaws in the technology. AI models, including those trained on LAION-5B, are notorious for leaking private information; they can reproduce identical copies of the material they were trained on, including medical records and photos of real people. Guardrails set by some companies to prevent the leakage of sensitive data have been repeatedly broken.”

    The organisation has asked for a prohibition on scraping children’s personal data into AI systems due to the privacy risks involved and the potential for new forms of misuse. It has also asked for a prohibition on the nonconsensual digital replication or manipulation of children’s likenesses and for mechanisms that would allow victims to seek meaningful justice and remedy.

    Inefficiency of Data Protection Law

    Pointing out the inefficiency of Brazil’s data protection law, they argued that the AI regulations proposed by the Brazilian Congress incorporate data privacy protections for children.

    “Children should not have to live in fear that their photos might be stolen and weaponized against them,” said Hye Jung Han, children’s rights and technology researcher and advocate at Human Rights Watch. “The government should urgently adopt policies to protect children’s data from AI-fueled misuse.”

    Why This Matters

    The dataset in question is LAION-5B, an open-source dataset promoted by the nonprofit LAION foundation, built by scraping data from the web. In December last year, Stanford researchers found more than 1,008 CSAM images within the dataset, possibly scraped from adult websites. Multiple other studies also showed the presence of hateful content and multiple examples of “rape, pornography, malign stereotypes, racist and ethnic slurs, and other extremely problematic content.”

    LAION-5B is also used to train the popular text-to-image model Stable Diffusion, which has been used to create synthetic CSAM in at least one instance. Last month, the FBI arrested an American citizen for creating 13,000 “hyper-realistic images of nude and semi-clothed prepubescent children” through Stable Difussion.

    The presence of images of children within the dataset thus raises several concerns—can the AI be prompted to reproduce an exact copy of the image from its dataset? HRW seems to consider it plausible. Further, is it easier to create synthetic CSAM from AI models that are trained on datasets containing images of children? If so, what can be done to prevent it?

    Also Read:

    The post Human Rights Watch Flags Images of Children in AI Training Datasets appeared first on MEDIANAMA.

    Latest Posts

    - Advertisement -

    Don't Miss

    Stay in touch

    To be updated with all the latest news, offers and special announcements.