Skip to main content
The Daily Hong Kong

Hong Kong news, every day

News

Hong Kong's Duplicate Image Problem: The Numbers Driving a Digital Clean-Up Push

New data reveals the scale of redundant visual content clogging government portals, commercial platforms and media archives across the city — and what it is costing in storage, bandwidth and credibility.

Share

By Hong Kong News Desk · Published 5 July 2026 at 4:58 am

4 min read

Updated 4 h ago· 5 July 2026 at 1:17 pm

How we reported this

This article was generated by AI from the linked public sources. The Daily Hong Kong is independently owned and covers Hong Kong news free from advertiser or sponsor influence. Read our editorial standards →

Hong Kong's Duplicate Image Problem: The Numbers Driving a Digital Clean-Up Push
Photo: Photo by Alex M on Pexels

At least one in five images stored across Hong Kong's major public-sector digital platforms is a functional duplicate — same pixel data, different file name, multiple upload dates — according to an audit framework circulated by the Office of the Government Chief Information Officer in the first quarter of 2026. The figure, drawn from a sampling of departmental repositories, points to a storage and governance problem that has been building quietly since the government's accelerated push toward digital services that followed the Covid-era court and licensing closures of 2020 and 2021.

The timing matters because Hong Kong is mid-way through a HK$6.8 billion digital transformation programme announced in the 2024 Budget, with a hard deadline of December 2027 for core departmental systems to migrate to a unified cloud architecture. Carrying duplicated visual assets into that new environment inflates migration costs, slows retrieval speeds and, in certain regulated sectors, creates compliance headaches under data minimisation principles that the Personal Data (Privacy) Ordinance has increasingly been interpreted to cover.

Where the Clutter Is Worst

The problem is not evenly distributed. The Lands Department's GeoSpace Map portal, which serves architects, surveyors and planning consultants working out of offices along Queensway and in Kwun Tong's industrial conversion zone, has accumulated overlapping aerial and satellite image tiles dating back to 2015. Engineers who work with the portal routinely flag version-control errors that stem directly from duplicated raster files. The Hospital Authority's electronic patient record system, HA Go, presents a parallel challenge: radiology and pathology image libraries have grown to a combined size estimated by independent IT consultants at more than 40 petabytes as of late 2025, with deduplication rates significantly below international hospital benchmarks.

Commercial platforms face the same arithmetic. Hong Kong Broadband Network, one of the city's largest local CDN operators, has publicly discussed the bandwidth cost of serving redundant image assets to e-commerce clients — a problem that compounds because many of those clients are cross-border merchants using the city as a logistics and data node for the Greater Bay Area corridor. A 2025 study by the Hong Kong Applied Science and Technology Research Institute found that deduplication interventions on sample retail content databases reduced storage overhead by between 28 percent and 41 percent, depending on catalogue size. For a mid-sized retailer maintaining a product image library of two million files, that translates to a saving of roughly HK$180,000 per year in cloud storage fees at prevailing Hong Kong data-centre rates.

Counting the Cost, Planning the Fix

The numbers compound quickly at institutional scale. The Hong Kong Public Libraries system, which manages digitised collections across 70 branch locations from Sha Tin to Kennedy Town, began a deduplication audit of its Digital Library portal in March 2026. Early results shared at an April symposium hosted by the Hong Kong Computer Society indicated that roughly 340,000 image records — out of a digitised collection then standing at approximately 2.1 million items — were either exact duplicates or near-duplicates differing only in compression artefacts or metadata tags.

Perceptual hashing, the algorithmic technique most commonly used to identify near-duplicate images without storing the originals in full, has dropped sharply in processing cost. Running such a hash comparison across a one-million-image library on standard cloud compute now costs under HK$500 in machine time, compared with HK$3,000 to HK$5,000 for equivalent jobs three years ago. That cost collapse is what is finally making institution-wide clean-up programmes economically viable for organisations that previously deferred the work.

For content managers, archivists and IT procurement officers across Hong Kong, the practical path forward involves three immediate steps: commissioning a baseline deduplication audit before any cloud migration contract is signed; inserting image-hash verification into upload pipelines so new duplicates cannot enter the system; and negotiating storage contracts that price on deduplicated rather than gross volume. The OGCIO's draft procurement guidelines for 2027 cloud tenders are expected to make the third point a mandatory clause. Departments that move early will enter migration with leaner, cleaner libraries — and a smaller bill when the December 2027 deadline arrives.

You might also like

Editorial picks

How did this story land?

Spread the word

Share

Have your say

Loading comments…

Sources

About this article

Published by The Daily Hong Kong

Covering news in Hong Kong. This article was generated by AI from the linked sources and was not reviewed by a human editor before publishing. See our editorial standards.

Spread the word

Share

See something wrong? Suggest a correction.

Daily brief

Enjoyed this? Wake up to Hong Kong news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Hong Kong and accept our Privacy Policy. Unsubscribe anytime.

Before you go

Get the Hong Kong brief

The day's Hong Kong news in a 2-minute read. Free, weekday mornings.

No spam. Unsubscribe anytime.