Hong Kong's public-sector digital archive holds tens of millions of photographs, scanned documents and graphic assets spread across at least a dozen separate departmental repositories — and a significant share of that storage is occupied by exact or near-exact duplicates. That is the practical reality now confronting the Government Records Service, which operates out of the former Victoria Barracks site in Admiralty, as it pushes through the second phase of its Smart Government Blueprint revision, a process that formally restarted in January 2026.
The duplication problem did not appear overnight. It accumulated over roughly two decades of well-intentioned but poorly co-ordinated digitisation campaigns, each department building its own pipeline with little cross-referencing against what its neighbours were already storing. The consequence is wasted expenditure on server capacity, slower search retrieval across citizen-facing portals, and a compliance headache as Hong Kong tightens data-governance rules under the updated Personal Data (Privacy) Ordinance framework.
How the Archive Sprawl Took Hold
The roots stretch back to the early 2000s, when departments such as the Lands Registry and the Planning Department began independent scanning drives to retire paper files. Both organisations captured substantial overlapping material — site photographs, cadastral maps, infrastructure diagrams — without a shared metadata standard that would have flagged the repetition. By the time the Innovation and Technology Bureau, now restructured into the Innovation, Technology and Industry Bureau on Harcourt Road, attempted to impose a unified cloud framework around 2018, the duplication was already embedded in legacy systems that were expensive to migrate.
The problem compounded after 2020. Government digitalisation accelerated sharply as the administration sought to modernise services during and after the social unrest period, and again during pandemic-era remote working mandates. Departments uploaded materials in bulk to meet internal performance targets, often without deduplication checks. The Smart City Blueprint 2.0, published in December 2020, set ambitious targets for e-government service integration, but critics within the information-management community argued at the time that back-end data hygiene had not kept pace with front-end delivery promises.
The Financial Secretary's 2025-26 Budget allocated HK$6.8 billion to digital infrastructure and smart government initiatives — a figure cited in the Budget speech delivered on 26 February 2025. Within that envelope, a dedicated line item for data-quality remediation, including duplicate-asset identification, was embedded inside the Digital Government Unit's operating costs for the first time. That represents a formal acknowledgement that the problem has a price tag attached to it.
What Remediation Actually Looks Like
The practical work of duplicate-image replacement involves three distinct stages: automated hash-matching to flag byte-for-byte identical files, perceptual-similarity algorithms to catch near-duplicates such as slightly cropped versions of the same photograph, and human review for assets flagged as ambiguous. The Government Records Service has been piloting this workflow on a subset of the Housing Department's estate-photography archive, which covers public housing estates from Tuen Mun in the west to Tseung Kwan O in the east. Early results from the pilot, discussed at a February 2026 inter-departmental working group, suggested that between 30 and 40 per cent of assets in the test batch were redundant — a proportion consistent with international benchmarks for large public-sector archives that were digitised rapidly without deduplication tooling.
Singapore's Government Technology Agency, GovTech, completed a comparable rationalisation exercise across its Whole-of-Government platform between 2022 and 2024, reducing its unstructured data footprint by roughly a quarter according to published agency reports. Hong Kong officials have cited that precedent internally as a reference point for what is achievable within a two-to-three-year window.
For departments still mid-process, the immediate practical step is to freeze new bulk uploads to the centralised iAM Smart backend pending a metadata audit — a recommendation the Digital Government Unit circulated in a March 2026 internal advisory. Longer term, any department procuring new imaging or document-management software after September 2026 will be required under revised procurement guidelines to demonstrate that deduplication is built into the ingest pipeline rather than retrofitted later. That requirement, if enforced consistently, should prevent the same sprawl from recurring in the next generation of government systems.