Saturday, July 20, 2024
- Advertisement -

    Latest Posts

    Here’s What a Recent Amnesty Report Says About Algorithmic Decision-Making in India; What it Missed

    Amnesty International published a report on April 30, 2024, spotlighting the risks of using technologies that enable algorithmic decision-making for governance and welfare delivery, and the lack of transparency in the manner governments are increasingly adopting artificial intelligence-based systems. Amnesty’s examination of the technology comes in light of an investigative report published by Al Jazeera in January 2024, revealing how an algorithm-based system ‘Samagra Vedika’ was instrumental in denying access to food security schemes to several eligible beneficiaries in the Indian state of Telangana.

    Amnesty’s report highlights that in an attempt to control welfare fraud, several governments are turning to creating digital profiles of citizens through privacy-intrusive data collection methods. This data is then run through technologies like AI, machine learning, big data analytics, and other algorithm-based tools to identify fraud and optimise resource allocation. Based on Al Jazeera’s reporting, Amnesty takes a deeper look at the technology behind Telangana’s Samagra Vedika and the technical factors that may in fact exclude rightful beneficiaries from social security schemes.

    Firstly, what’s Samagra Vedika project about?

    In 2014, the Telangana government conducted a ‘Samagra Kutumba Survey’ to create a master database of every household in the state. Based on this data, the government envisaged the Samagra Vedika project to integrate data about every individual across various departments to obtain a “360-degree view” of any citizen.

    Several Indian State governments have undertaken similar family database programmes to establish platforms that can essentially pull out all kinds of data of a citizen of that particular State from different government departments and examine whether the individual is eligible for a specific government scheme. These States include Haryana, Maharashtra, Tamil Nadu, Uttar Pradesh, and the UT of Jammu & Kashmir. The stated objective of deploying an algorithm-based system is to prevent fraud in the welfare delivery system and de-duplication of data in government databases used for e-governance.

    What does the Al Jazeera’s report say?

    Al Jazeera’s report traces the story of Bismillah Bee, a 67-year-old widowed woman living in an urban slum in Telangana with 12 members of her family, where the lady was denied subsidised foodgrains due to an erroneous rejection by the Samagra Vedika system. In 2021, the algorithm had wrongly tagged Bee’s husband as a car owner, despite the family being listed as under “below-poverty-line” in the census record. This is because the algorithm mistook Bee’s husband Syed Ali, a rickshaw puller, for Syed Hyder Ali, who was the car owner.

    The report makes critical revelations about how algorithmic decision-making triumphed the erstwhile human intervention processes in Bee’s struggle to access the fundamental right to food security. Importantly, the report informed that between 2014 and 2019, Telangana cancelled more than 1.86 million existing food security cards and rejected 142,086 fresh applications without any notice. Many of these rejections were based on faulty data or bad algorithmic decisions by Samagra Vedika.

    Al Jazeera also found that the Samagra Vedika system was developed by Posidex Technologies Private Limited and employs machine learning and ‘entity resolution technology’ to uniquely identify a person. However, there’s little to no information in the public domain about the internal functioning of the technology. Further, Right to Information requests to the State for the source code that runs the programme and the data used by the system for making decisions were denied stating that the company’s rights over such information.

    What is Entity Resolution?

    Amnesty’s report informed that Entity Resolution is a technique of finding records, including details such as name and other personal information, about a real-world entity or a citizen across multiple data sources. It further explained:

    “In practice, it involves selecting and comparing pairs of records to determine whether they are a match or not. It can be further divided into the subcategories of “Record Linkage” and “Deduplication”. Record Linkage focuses on finding similar records in two or more datasets, while Deduplication aims to identify matches within a single dataset.”

    How does Entity Resolution work?

    Jayesh Ranjan, Telangana IT and Industries secretary, informed MediaNama in 2020, that the Samagra Vedika is an algorithm that can obtain information from at least 30 departments, including GST, treasury, tax etc, but does not merge databases. A presentation presumably prepared by the state IT department showed that Samagra Vedika has access to data on electricity bills, water bills, property records, pensions, vehicle databases, ration cards etc.

    Amnesty International analysis showed that the government aimed to view a consolidated profile of every citizen through the Samagra Vedika system, with common identifiers including the name, address, and date of birth. However, inconsistencies in data entry methods in different databases can pose greater challenges in matching corresponding records. Here’s where entity resolution comes into the picture as it trains an algorithm to “systematically compare pairs of records based on common identifiers and designate a score as to how likely they are to be the same person”.

    It involves three phases:

    • Preprocessing and blocking: Reducing computational time to compare records by cleaning the data and narrowing down to comparisons to be made, and discarding pairs that are not matches.
    • Comparison and matching: Applying machine-learning techniques to assess the similarity between the pairs of records and “classify or predict” how likely the pair match with each other.
    • Clustering: A clustering algorithm is used to group records that are determined to refer to the same entity.

    In the case of Samagra Vedika, the Amnesty International report stated that due to a lack of transparency regarding the documentation and code, the system’s internal functioning is unknown. But, existing information shows that the Samagra Vedika used a combination of attributes like name, and address alongside date of birth, phone number, and father’s name, to link the data and compare records.

    Defining the matching variables used to link different records is a challenging task, especially given that there are innumerable variations in people’s names and the ways in which the same names are recorded due to diversity in languages and regional dialects. The report explains that the machine learning process requires setting a tolerance threshold which will determine whether there is a match or not. For Samagra Vedika, a “positive” means fraud application or duplicative, whereas a “negative” means legitimate user. A false positive would occur when the system labels a legitimate application as fraudulent and a false negative occurs when the system tags a fraudulent application as legitimate.

    “Statistically-speaking the number of false negatives cannot be reduced without risking an increase to the false positives (and vice versa), so this means setting the tolerance threshold becomes a sensitive choice,” the report explained. While the government’s focus is on minimising false positives and saving public funds, in the process it could lead to excluding eligible people from welfare provision. Thus, the report states that it is important for researchers to investigate the internal workings of these systems.

    Why is algorithmic audit important?

    Amnesty report underscores the importance of algorithmic audit, which involves assessment of an AI system’s performance, outcomes, and impact. It enables researchers and technologists to identify issues with AI systems that can ultimately hamper people’s right to privacy, social security, and non-discrimination. Algorithmic audit provides insights into a system’s internal functioning, the data that’s used to build the system, the scale at which the system is being used, the scope of harm, and systematic discrimination or bias in the algorithm’s outputs or decisions.

    In order to conduct an audit of such systems, the auditors must have access to the data, documentation, and the code for the system. However, most of these systems are procured from the private sector and this presents a major challenge in obtaining necessary documentation for audits by independent researchers. While Amnesty International and independent researchers, who attempted an external audit of Samagra Vedika system, could not access these documents, they conducted a study of Entity Resolution algorithm, EntityMatch, that formed the basis of the Samagra Vedika system. However, obtaining APIs from private companies can be expensive and prove to be a barrier for investigators.

    Amnesty International recommends that States must implement mandatory human rights due diligence requirements for all companies when government agencies deploy automated systems.

    “This impact assessment must be carried out during the system design, development, use, and evaluation, and – where relevant – retirement phases of automated or algorithmic decision-making systems. The impact on all human rights, including social and economic rights, must be assessed and properly addressed in the human rights due diligence process,” the report noted.

    What did the Amnesty report miss pointing out?

    The Amnesty report rightly probes into algorithmic decision-making for governance in the Indian States. This is critical given that AI is a hot topic currently, but there are also other important issues associated with these massive data collection and data integration exercises that demand attention.

    Violation of citizen privacy: It is important to note that the states are undertaking family database projects to integrate databases across government departments without the notification of a specific law. Experts highlight that any data collection exercise leading to in-depth profiling of citizens, and conducted in the absence of the law is a direct violation of the fundamental right to privacy. The fact that there is no defined law detailing such linking of databases fails the first requirement under the three-fold test of legality, legitimacy and proportionality for any restriction of the right to privacy laid out by India’s Supreme Court in the Justice K.S Puttaswamy v Union of India 2017.

    Aadhaar and surveillance: Plans to create a centralised platform for 360-degree profiling of citizens in India stems from another e-governance project, the State Resident Data Hubs (SRDH), undertaken by several states before the 2018 Aadhaar judgment. The SRDH was a state-level repository of Unique Identification Authority of India (UIDAI) data of residents along with demographic information and photos. Aadhaar data is one of the mandatory details that will be recorded in the current systems as Aadhaar authentication is prerequisite to accessing welfare services in India.

    In 2018, India’s apex Court noted that Aadhaar, if seeded into every database, becomes a bridge between different “data silos” and if this database is compromised, it will allow anyone with access to the information to “re-construct a profile of an individual’s life”. This, he stated, “is contrary to the right to privacy and poses severe threats due to potential surveillance”.

    Discrimination against specific groups of the population: While ‘efficient governance’ seems to be the stated objective, regardless of whether it is achieved or not, the government has its hands on a lot of granular data about every resident. Speaking to MediaNama, Srikanth L, a public interest technologist, had highlighted that access to these cross-sectoral databases will empower the government with a much clearer and categorised list of information on an individual’s education, health, income level, savings, property and tax-paying behaviour. This can also enable further segregation on the basis of religious and caste groups with district-level categorisation being the next layer of information, thereby benefiting one set of people and directly harming the other. For more information, read MediaNama’s in-depth report on State Family Database (SFDB) projects in India.


    STAY ON TOP OF TECH NEWS: Our daily newsletter with the top story of the day from MediaNama, delivered to your inbox before 9 AM. Click here to sign up today!


    Also Read:

    The post Here’s What a Recent Amnesty Report Says About Algorithmic Decision-Making in India; What it Missed appeared first on MediaNama.

    Latest Posts

    - Advertisement -

    Don't Miss

    Stay in touch

    To be updated with all the latest news, offers and special announcements.