Hong Kong's public records offices are sitting on a problem that did not appear overnight. Across government departments, land registries, and identity administration systems, duplicate digital images — scanned documents, property photographs, biometric records — have accumulated into a backlog that administrators are only now beginning to quantify. The issue cuts across the Land Registry in Queensway, the Immigration Department's offices in Wan Chai, and the vast digitisation push that followed the 2020 rollout of the iAM Smart identity platform.
The timing matters. Since 2020, Hong Kong has been under significant pressure to accelerate digital integration with the nine mainland cities of the Greater Bay Area. That process required rapid bulk scanning of legacy paper records — some dating back to colonial-era property transactions — and the ingestion of those files into shared databases. Speed, not deduplication, was the priority. The result is a layered archive where the same document can exist in three or four versions, each tagged with slightly different metadata, making retrieval slower and storage costs higher.
The Infrastructure Behind the Backlog
The roots of the problem stretch back further than the GBA integration push. In 2016, the Office of the Government Chief Information Officer launched a cloud migration programme intended to consolidate departmental data silos. That programme, by its own published milestones, was supposed to complete a first-phase audit by 2019. The audit was delayed. When the pandemic hit in early 2020, departments were uploading scanned records at volume without completing the cross-referencing work that would have caught duplicates early.
Estimates from the government's own digital policy documentation — published under the Digital Economy Development Committee framework — suggest the public sector holds upward of 40 petabytes of unstructured data, though no official figure has been released specifically for duplicate image files. Industry professionals working on government contracts in Cyberport and the Science Park in Pak Shek Kok have described the challenge informally as structural rather than incidental, though none has been authorised to speak on record about specific contract details.
The private sector has its own version of the same headache. Banks operating out of the International Finance Centre in Central and the Exchange Square towers are subject to the Hong Kong Monetary Authority's data retention requirements, which mandate that transaction-related imagery — cheque scans, account-opening photographs — be kept for defined periods. Those retention windows, combined with system migrations between 2021 and 2024 as several institutions relocated processing functions to Singapore or shifted to mainland-linked platforms, created duplicate archives that compliance teams are still working through.
What Cleanup Actually Looks Like
Deduplication is not a single event. It is a sustained process that involves hash-matching algorithms to identify identical files, human review for near-duplicates where metadata differs but content is the same, and governance decisions about which version of a record is authoritative. For a government department holding land title images registered under the Land Titles Ordinance, the stakes of getting that wrong are legally significant.
The Digital Policy Office, which absorbed some OGCIO functions after a 2023 restructuring, has indicated in its published work plans that a centralised deduplication framework is being developed under the Smart City Blueprint 2.0 initiative. That blueprint, released in updated form in late 2023, sets a broad target for improved data quality across government systems by 2028 — though specific metrics for image deduplication have not been published.
For organisations outside government, the practical path forward involves auditing storage environments before the end of the current financial year, which closes in March 2027. Cloud storage costs in Hong Kong — benchmarked against providers operating from Tseung Kwan O's data centre cluster — have risen roughly in line with regional energy prices, making the financial case for cleanup more compelling than it was three years ago. Departments and private firms that delay will find the problem compounding as GBA data-sharing obligations intensify and cross-border compliance requirements demand cleaner, verifiable records.