Skip to main content
The Daily Hong Kong

Hong Kong news, every day

News

How Hong Kong's Digital Archives Ended Up Riddled With Duplicate Images — And What It Took to Get Here

Years of rapid digitisation, fragmented government databases, and shifting storage contracts left the city's public record systems awash in redundant files; now institutions are finally reckoning with the backlog.

Share

By Hong Kong News Desk · Published 5 July 2026 at 5:06 am

4 min read

Updated 4 h ago· 5 July 2026 at 1:13 pm

How we reported this

This article was generated by AI from the linked public sources. The Daily Hong Kong is independently owned and covers Hong Kong news free from advertiser or sponsor influence. Read our editorial standards →

How Hong Kong's Digital Archives Ended Up Riddled With Duplicate Images — And What It Took to Get Here
Photo: Photo by Jesse R on Pexels

Hong Kong's public digital archives contain hundreds of thousands of duplicate image files — photographs, scanned documents, and heritage records stored multiple times across incompatible systems — a problem that took the better part of a decade to accumulate and is only now being addressed in any coordinated way. The Government Records Service, based in the Kwun Tong Government Offices complex, confirmed earlier this year that a structured deduplication programme is underway, though no completion date has been publicly committed to.

The timing matters. With the Greater Bay Area integration accelerating cross-border data flows, and Hong Kong's push to position itself as a digital economy hub in competition with Singapore, the integrity and efficiency of public-sector data infrastructure has moved from a back-office concern to a policy priority. Redundant image files are not merely a storage inconvenience — they create version-control failures, slow retrieval systems, and in some cases have surfaced contradictory records in legal and planning proceedings.

How the Problem Built Up

The roots of the duplication crisis trace back to the mid-2000s, when individual government bureaux digitised their own holdings independently, with no unified file-naming convention or central metadata standard. The Leisure and Cultural Services Department, which oversees the Hong Kong Public Libraries network including the flagship Central Library on Causeway Bay's Moreton Terrace, ran its own digitisation track. The Planning Department, headquartered in North Point, ran another. The two systems did not speak to each other.

A 2014 audit — the last publicly available comprehensive review of government digital storage — found that at least three separate agencies held overlapping photographic records of the same heritage sites, including buildings along Central's Pottinger Street and structures in the Kowloon Walled City Park precinct. Storage costs were already being flagged as unsustainable even then. By the time cloud migration contracts were signed in the early 2020s, deduplication had been deferred repeatedly, meaning redundant files were simply moved offshore at additional expense rather than resolved.

The National Security Law period after June 2020 added a separate layer of complexity. Certain categories of government imagery — protest documentation, public-order records — were reclassified or access-restricted, but the underlying file structures were rarely cleaned up. Archivists working within the system have described, in general terms at public records management conferences, a situation where restricted and unrestricted versions of the same image coexist in different database nodes with no automated reconciliation.

What Deduplication Actually Involves — And Where It Stands

The current programme, which the Government Records Service has described in broad terms in its 2025–26 annual work plan, involves perceptual hashing — a technique that identifies visually identical or near-identical images even when file names or metadata differ. The Science Park campus in Pak Shek Kok, Sha Tin, is hosting some of the computational workload through a partnership arrangement with Hong Kong Cyberport's affiliated technology tenants, though the specific contractual details have not been disclosed in public documents.

The scale is significant. Government estimates cited in the 2025 Policy Address supporting documentation reference more than 40 terabytes of image data held across legacy departmental systems, a figure that does not include the separate holdings of the Hong Kong Film Archive in Sai Wan Ho or the Hong Kong Heritage Museum in Sha Tin. Each of those institutions runs its own deduplication cycle on a different schedule.

For institutions and researchers who rely on public records — law firms in Central, academic departments at the University of Hong Kong in Pok Fu Lam, journalists working from the Foreign Correspondents' Club on Lower Albert Road — the practical effect of the cleanup will be faster, more reliable search results and fewer instances of conflicting document versions surfacing in the same query. The Government Records Service has indicated that public-facing archive portals should reflect the improvements by the second quarter of 2027, though departments are migrating to the cleaned datasets on a rolling basis. Anyone with pending records requests is advised to check directly with the relevant bureau whether the files they need sit in an already-processed tranche or are still awaiting deduplication review.

You might also like

Editorial picks

How did this story land?

Spread the word

Share

Have your say

Loading comments…

Sources

About this article

Published by The Daily Hong Kong

Covering news in Hong Kong. This article was generated by AI from the linked sources and was not reviewed by a human editor before publishing. See our editorial standards.

Spread the word

Share

See something wrong? Suggest a correction.

Daily brief

Enjoyed this? Wake up to Hong Kong news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Hong Kong and accept our Privacy Policy. Unsubscribe anytime.

Before you go

Get the Hong Kong brief

The day's Hong Kong news in a 2-minute read. Free, weekday mornings.

No spam. Unsubscribe anytime.