Hong Kong's public and private sector institutions are sitting on vast archives riddled with duplicate images — redundant photographs, scanned documents and digital assets stored simultaneously across incompatible systems — and the effort to fix it has finally forced a reckoning with decisions made, or avoided, over the better part of two decades.
The issue matters now because the Hong Kong government's push toward Greater Bay Area digital integration has exposed just how disorganised the city's own data infrastructure remains. Mainland partners operating under China's unified national cloud architecture have found it difficult to cross-reference records held by Hong Kong agencies whose databases were built independently, often by different vendors, and never rationalised. The result is duplicated imagery piling up in storage — wasting capacity, inflating costs and slowing down the data-sharing that the GBA corridor demands.
A Problem Built Layer by Layer
The roots go back to the late 2000s, when individual government bureaux in Wan Chai and Central began digitising paper records without a coordinating framework. The Planning Department on Murray Road, the Lands Department in Queensway Government Offices and the Housing Authority in Sha Tin each contracted separate vendors and built siloed repositories. When those systems were expanded or upgraded over successive administrations, duplicate files were rarely purged — they were simply migrated alongside the originals.
Private institutions followed a similar path. The Hong Kong Jockey Club, whose digital archive spans racing imagery, event photography and member records stretching back decades, has acknowledged in annual reports that data governance reform is ongoing. Banks headquartered in Central, including HSBC at its Queen's Road Central offices, have invested heavily in deduplication tools since the Hong Kong Monetary Authority tightened data management guidelines in 2022, but legacy systems inherited from pre-2010 acquisitions still harbour redundant files.
The scale is not trivial. According to the Digital Policy Office, which was established in July 2023 to consolidate the city's previously fragmented technology governance, government bureaux collectively managed more than 2,000 distinct IT systems as of that year. Industry estimates — widely cited in Hong Kong ICT conference materials from 2024 and 2025 — suggest that between 20 and 30 percent of stored image data across large organisations in the city is duplicated in some form, though precise government-specific figures have not been made public.
Why Reform Stalled and What Changed
For years, the practical incentive to clean up duplicate image stores was weak. Storage costs fell steadily, making redundancy cheap to tolerate. The more pressing argument for action came from two directions at once: the GBA integration timetable, which set 2025 as a target year for expanded cross-boundary data flows under the Northern Metropolis development framework, and the Article 23 security legislation enacted in March 2024, which imposed stricter controls on how sensitive data — including images of individuals — could be held and accessed. Suddenly, knowing exactly what you had stored, and where, became a compliance question, not just an efficiency one.
The Office of the Government Chief Information Officer has since mandated that bureaux complete image deduplication audits as part of their cloud migration planning. Vendors operating in the Cyberport technology park in Pok Fu Lam have reported growing demand from both government and financial services clients for deduplication and data cataloguing tools since late 2024. Several of those contracts involve cross-referencing image metadata against records held on Mainland systems, a technically complex process given differing file format standards.
For organisations still working through the backlog, the practical steps are clear: prioritise audit over deletion, since removing an image later flagged as the sole surviving copy of a legal record creates its own liability; adopt metadata tagging standards compatible with the national GB/T framework used across the GBA; and engage the Digital Policy Office's shared services programme before procuring standalone deduplication software. The deadline pressure is real — GBA data corridor reviews are scheduled to continue through the second half of 2026, and institutions that cannot demonstrate clean, non-redundant image archives face delays in obtaining the cross-boundary data transfer approvals they need to operate fully in the expanded market.