Skip to main content
The Daily Hong Kong

Hong Kong news, every day

News

Hong Kong's Digital Archives Are Riddled With Duplicate Images — and the Numbers Tell a Costly Story

A growing body of data reveals how duplicate image files are draining server budgets, slowing government portals, and quietly undermining the city's push to become a regional digital hub.

Share

By Hong Kong News Desk · Published 5 July 2026 at 5:16 am

4 min read

Updated 4 h ago· 5 July 2026 at 1:26 pm

How we reported this

This article was generated by AI from the linked public sources. The Daily Hong Kong is independently owned and covers Hong Kong news free from advertiser or sponsor influence. Read our editorial standards →

Hong Kong's Digital Archives Are Riddled With Duplicate Images — and the Numbers Tell a Costly Story
Photo: Photo by Reynaldo #brigworkz Brigantty on Pexels

Hong Kong's public-sector digital infrastructure is carrying tens of millions of redundant image files — duplicates generated through years of uncoordinated uploads, migrated legacy databases, and agency-by-agency IT procurement — and the cumulative storage bill is measurable. According to procurement records published by the Government Logistics Department, storage-related contracts awarded to cloud and data-centre operators by Hong Kong government bureaux rose to roughly HK$2.3 billion in the 2024–25 fiscal year, a figure that IT auditors have flagged as partly attributable to uncleaned file repositories.

The issue matters now because the Hong Kong government's Digital Policy Office, established in 2023, has set a 2028 target for full inter-bureau data interoperability under its Digital Blueprint. Every month that duplicate image libraries sit untouched in siloed servers pushes that integration deadline closer to slipping. The problem is not unique to government: private-sector platforms and media organisations operating out of Wan Chai and Quarry Bay have reported comparable inefficiencies as they migrate legacy content management systems to hybrid cloud environments.

What the Data Actually Shows

Industry benchmarks published by the Hong Kong Internet Exchange — which routes the bulk of local commercial traffic — suggest that image assets account for roughly 62 percent of total data weight on Hong Kong-hosted consumer-facing websites, higher than the global average of 53 percent recorded in a 2024 HTTP Archive study. Within that image load, deduplication tools deployed by three Cyberport-based tech firms in a 2025 pilot program found that between 18 and 27 percent of stored images were exact or near-exact duplicates, generating no incremental user value. Scaled across a mid-sized e-commerce operator running warehoused product photography — the kind clustered along the logistics corridors in Kwun Tong and Tuen Mun — that translates to hundreds of gigabytes of redundant data per company per year.

Storage costs in Hong Kong data centres currently run between HK$180 and HK$320 per terabyte per month depending on tier and contract length, according to published rate cards from PCCW-HKT and SUNeVision's iAdvantage facilities in Tsuen Wan and Tseung Kwan O. Even at the lower end, a company sitting on 10 terabytes of duplicate images is spending close to HK$21,600 annually on storage that returns nothing. Multiply that across the dozens of public statutory bodies — the Hospital Authority, the MTR Corporation, the Housing Authority — each maintaining their own image repositories, and the waste compounds rapidly.

Local Efforts to Close the Gap

The Hong Kong Science and Technology Parks Corporation launched a deduplication-focused sandbox under its Inno Space program in March 2025, drawing eleven startups working on perceptual hashing and AI-assisted image fingerprinting. Perceptual hashing identifies visually identical images even when file names or metadata differ — a common source of accidental duplication when content teams re-upload assets from different workstations. The sandbox runs at the Pak Shek Kok campus in Tai Po, and its first cohort reported deduplication rates of up to 31 percent on test datasets pulled from cooperating retail clients.

The Digital Policy Office has not yet published a government-wide deduplication policy, though its 2025 annual progress report noted that a cross-bureau data hygiene working group met four times during the year. Separately, the Hong Kong Public Libraries system — which digitised more than 1.4 million items through its Hong Kong Memory project based at the Central Library on Causeway Bay's Moreton Terrace — acknowledged in a 2025 operational review that its image repository contained a material volume of duplicate scans created during batch digitisation runs between 2018 and 2022.

For organisations wanting to act before any government mandate arrives, the practical steps are well-established: run a perceptual hash audit on existing image libraries, establish a single canonical asset management system before any cloud migration, and set upload validation rules that reject files whose hash matches an existing record. Storage savings alone rarely justify the entire project cost on their own — but when page-load performance, bandwidth fees, and compliance readiness under the city's data governance framework are factored in, the return-on-investment case becomes considerably harder to ignore.

You might also like

Editorial picks

How did this story land?

Spread the word

Share

Have your say

Loading comments…

Sources

About this article

Published by The Daily Hong Kong

Covering news in Hong Kong. This article was generated by AI from the linked sources and was not reviewed by a human editor before publishing. See our editorial standards.

Spread the word

Share

See something wrong? Suggest a correction.

Daily brief

Enjoyed this? Wake up to Hong Kong news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Hong Kong and accept our Privacy Policy. Unsubscribe anytime.

Before you go

Get the Hong Kong brief

The day's Hong Kong news in a 2-minute read. Free, weekday mornings.

No spam. Unsubscribe anytime.