At least one in five images stored across Hong Kong's major public-sector digital platforms is a functional duplicate — same pixel data, different file name, multiple upload dates — according to an audit framework circulated by the Office of the Government Chief Information Officer in the first quarter of 2026. The figure, drawn from a sampling of departmental repositories, points to a storage and governance problem that has been building quietly since the government's accelerated push toward digital services that followed the Covid-era court and licensing closures of 2020 and 2021.
The timing matters because Hong Kong is mid-way through a HK$6.8 billion digital transformation programme announced in the 2024 Budget, with a hard deadline of December 2027 for core departmental systems to migrate to a unified cloud architecture. Carrying duplicated visual assets into that new environment inflates migration costs, slows retrieval speeds and, in certain regulated sectors, creates compliance headaches under data minimisation principles that the Personal Data (Privacy) Ordinance has increasingly been interpreted to cover.
Where the Clutter Is Worst
The problem is not evenly distributed. The Lands Department's GeoSpace Map portal, which serves architects, surveyors and planning consultants working out of offices along Queensway and in Kwun Tong's industrial conversion zone, has accumulated overlapping aerial and satellite image tiles dating back to 2015. Engineers who work with the portal routinely flag version-control errors that stem directly from duplicated raster files. The Hospital Authority's electronic patient record system, HA Go, presents a parallel challenge: radiology and pathology image libraries have grown to a combined size estimated by independent IT consultants at more than 40 petabytes as of late 2025, with deduplication rates significantly below international hospital benchmarks.
Commercial platforms face the same arithmetic. Hong Kong Broadband Network, one of the city's largest local CDN operators, has publicly discussed the bandwidth cost of serving redundant image assets to e-commerce clients — a problem that compounds because many of those clients are cross-border merchants using the city as a logistics and data node for the Greater Bay Area corridor. A 2025 study by the Hong Kong Applied Science and Technology Research Institute found that deduplication interventions on sample retail content databases reduced storage overhead by between 28 percent and 41 percent, depending on catalogue size. For a mid-sized retailer maintaining a product image library of two million files, that translates to a saving of roughly HK$180,000 per year in cloud storage fees at prevailing Hong Kong data-centre rates.
Counting the Cost, Planning the Fix
The numbers compound quickly at institutional scale. The Hong Kong Public Libraries system, which manages digitised collections across 70 branch locations from Sha Tin to Kennedy Town, began a deduplication audit of its Digital Library portal in March 2026. Early results shared at an April symposium hosted by the Hong Kong Computer Society indicated that roughly 340,000 image records — out of a digitised collection then standing at approximately 2.1 million items — were either exact duplicates or near-duplicates differing only in compression artefacts or metadata tags.
Perceptual hashing, the algorithmic technique most commonly used to identify near-duplicate images without storing the originals in full, has dropped sharply in processing cost. Running such a hash comparison across a one-million-image library on standard cloud compute now costs under HK$500 in machine time, compared with HK$3,000 to HK$5,000 for equivalent jobs three years ago. That cost collapse is what is finally making institution-wide clean-up programmes economically viable for organisations that previously deferred the work.
For content managers, archivists and IT procurement officers across Hong Kong, the practical path forward involves three immediate steps: commissioning a baseline deduplication audit before any cloud migration contract is signed; inserting image-hash verification into upload pipelines so new duplicates cannot enter the system; and negotiating storage contracts that price on deduplicated rather than gross volume. The OGCIO's draft procurement guidelines for 2027 cloud tenders are expected to make the third point a mandatory clause. Departments that move early will enter migration with leaner, cleaner libraries — and a smaller bill when the December 2027 deadline arrives.