Hong Kong's major digital repositories are sitting on a problem years in the making. Duplicate images — redundant, mislabelled, or unverified photographs stored across public archives, media libraries, and government portals — have accumulated to a point where institutions can no longer defer the question of what to do about them. The decision window is now.
The urgency is sharpest in mid-2026 because several overlapping pressures have converged at once. The Hong Kong Public Libraries network, which falls under the Leisure and Cultural Services Department, is partway through a broader digitisation push that began accelerating after the Covid-era closure of branch reading rooms. That programme has pulled in hundreds of thousands of scanned images from community collections, newspaper morgues, and government photo services. Many of those files entered the system without deduplication protocols. At the same time, private vendors supplying image metadata to financial institutions along Des Voeux Road Central are renegotiating licensing contracts that expire in the third quarter of 2026, and duplicate records are inflating the apparent size — and therefore the price — of those datasets.
What Duplication Actually Costs
Storage is the obvious line item. Commercial cloud storage in Hong Kong, priced through regional providers operating out of Tseung Kwan O's data centre corridor, currently runs at roughly HK$0.18 to HK$0.25 per gigabyte per month for enterprise-tier services, according to market rate comparisons circulating among IT procurement teams. A mid-size newsroom or archive holding 40 terabytes of undeduped image files could be paying for several terabytes of redundant data every billing cycle. Across an institution like the Hong Kong Film Archive in Sai Wan Ho, which holds physical and digital material spanning decades of local cinema, the scale of the challenge is considerably larger.
Beyond cost, there is a legal dimension. Hong Kong's Personal Data (Privacy) Ordinance, Cap. 486, applies when duplicate image sets include photographs of identifiable individuals. Holding multiple unverified copies of the same image of a private person — particularly if those images were obtained through different channels and stored without clear consent records — creates compliance exposure. The Office of the Privacy Commissioner for Personal Data issued updated guidance on data minimisation in late 2024, and institutions that have not audited their image holdings since then are exposed.
The Decisions No One Wants to Make First
Three choices are sitting on the desks of archive managers and chief information officers across Wan Chai, Kowloon Tong, and the commercial districts of Central. First: whether to run automated deduplication algorithms across existing holdings and accept that some genuinely distinct images with near-identical metadata will be incorrectly flagged for deletion. The error rate on current perceptual hashing tools is not zero, and for institutions with irreplaceable historical material, that is a meaningful risk.
Second: whether to centralise the deduplication function through a shared service — potentially coordinated by the Office of the Government Chief Information Officer, which has been expanding its role in cross-bureau technology governance since 2023 — or leave individual departments and private operators to solve the problem independently. Centralisation is faster and cheaper. It also means one institution holds decision-making power over what gets kept.
Third, and most practically urgent: what to do with the duplicates once identified. Deletion, archiving to cold storage, or quarantine pending manual review all carry different cost and compliance profiles. Quarantine is the safest option legally, but it defers the storage cost problem rather than solving it.
The timeline pressure is real. Licensing renegotiations tied to commercial image databases serving financial firms in Central are expected to conclude by September 2026. Institutions that have not completed a preliminary deduplication audit by then will be negotiating from a position of incomplete information about what they actually hold. For public archives, the Leisure and Cultural Services Department's internal review calendar suggests a report to the Culture, Sports and Tourism Bureau is expected before the end of the financial year in March 2027. That gives roughly eight months — a tight window for institutions that have spent years letting the backlog grow.