Wednesday, July 24, 2024
- Advertisement -

    Latest Posts

    OpenAI to Build Media Manager to Protect Content Creators’ Choice, Here’s What Remains Unclear

    OpenAI is building a Media Manager to enable content creators and content owners to choose how their works are being used for artificial intelligence (AI) purposes. In a blog published on May 7, the company said that the Media Manager will:

    “…enable creators and content owners to tell us what they own and specify how they want their works to be included or excluded from machine learning research and training. Over time, we plan to introduce additional choices and features.”

    The blog further stated that the tool will help OpenAI identify copyrighted text, images, audio, and video across multiple sources and reflect creator preferences.

    OpenAI also said that it has partnered with news publishers like Financial Times, Le Monde, Prisa Media, Axel Springer and others to display their content in ChatGPT and also train the bot to generate relevant publisher content for users.

    Claims that its AI models do not “regurgitate” content:

    Most of OpenAI’s blog appears to be an attempt to offer clarification over the unauthorised use of publishers’ content in the wake of a slew of lawsuits against the company. The blog stated that OpenAI’s models do not repeat or “regurgitate” content.

    “AI models can state facts, which are in the public domain. If on rare occasions a model inadvertently repeats expressive content, it is a failure of the machine learning process. This failure is more likely to occur with content that appears frequently in training datasets, such as content that appears on many different public websites due to being frequently quoted,” the blog noted, adding that the company is “making improvements” to prevent such repetition.

    This is important in the context of the New York Times lawsuit against OpenAI, in which the publication alleged that OpenAI’s ChatGPT and Microsoft’s Bing were able to “generate output that recites Times content verbatim, closely summarizes it, and mimics its expressive style” and Bing also “generates responses that contain verbatim excerpts and detailed summaries of Times articles.” The publication had also shared multiple examples of ChatGPT producing text very similar to its articles without any attribution. OpenAI had refuted the allegations stating that The New York Times had manipulated the prompts to produce regurgitations.

    However, a recent lawsuit by eight publications in the United States spotlights the ability of AI models to memorise content. The lawsuit explained that on giving the right prompt, the AI model would repeat large portions of the materials they were trained on. This, they said, showed that LLM parameters “encode retrievable copies of many of those training works.”

    Will Media Manager protect publishers’ copyrighted works?

    OpenAI claims that it’s building a tool to identify copyrighted text, images, audio, and video across multiple sources and reflect creator preferences. Notably, news publications suing OpenAI have alleged that the copyright management information on publications’ works was intentionally removed by OpenAI during scraping, training the models, and distributing unauthorised copies of it. This, they said, would protect OpenAI’s infringement of copyrighted works and ultimately the end-user’s infringement when the model generated copies of publications’ work.

    Secondly, as Reid Sothen, an artist and illustrator, rightly pointed out on X, if OpenAI can develop tools to identify copyrighted data, such abilities should have been applied during scraping, instead of offering creators an opt-out option.

    The company also stated in its recent blog post that its AI models do not retain access to or store data analysed in training and merely generate outputs based on examining relationships in the available information. However, AI experts have pointed out a blog by OpenAI engineer James Betker, which underlined the importance of a dataset in determining model behaviour. This raises questions about OpenAI’s claims that copyrighted works are excluded when publicly available data is used for training AI models.

    “It’s becoming awfully clear to me that these models are truly approximating their datasets to an incredible degree….What this manifests as is – trained on the same dataset for long enough, pretty much every model with enough weights and training time converges to the same point,” Betker wrote.

    Also Read:

    The post OpenAI to Build Media Manager to Protect Content Creators’ Choice, Here’s What Remains Unclear appeared first on MediaNama.

    Latest Posts

    - Advertisement -

    Don't Miss

    Stay in touch

    To be updated with all the latest news, offers and special announcements.