Originally published 31 August 2022 on my old WordPress. Resharing here as the first piece of writing on the new site — screenshots dropped, links freshened, and the Malaysia portal entry updated for 2026.

As I continue my journey deeper into data, one thing keeps coming back: finding the dataset itself can eat up far more time than you’d expect. Something clean, relevant, and big enough to model on is half the battle before you’ve written a line of code. Below is the round-up of open-data portals I keep going back to.

1. Malaysia Open Data Portal

Malaysia’s official open-data portal, covering datasets from ministries, state governments, and statistical agencies. The portal was modernised under the Public Sector Open Data initiative and now sits alongside related platforms such as OpenDOSM (Department of Statistics Malaysia) and KKMNOW (Ministry of Health). Its current tagline — “High frequency. High granularity. High impact.” — gives a sense of how far the ecosystem has come from the older MAMPU-managed days.

Link: https://data.gov.my/

2. Kaggle

The platform most data scientists already know. An abundance of datasets across nearly every domain, free to download (account required), plus competitions, notebooks, and active discussion threads.

Link: https://www.kaggle.com/

3. Asia Open Data Portal

A meta-portal that harvests metadata from 20+ Asian data portals, indexing tens of thousands of open datasets in one place. Useful when you want regional data and don’t want to bounce between national portals.

Link: https://dataportal.asia/

4. UCI Machine Learning Repository

A staple for the ML community — databases, domain theories, and data generators widely used in research and teaching. The “beta” version I linked in 2022 has since become the production site.

Link: https://archive.ics.uci.edu/

5. OECD

Best for macroeconomic data and member-country statistics — economic indicators, papers, and time series for cross-country comparison.

Link: https://www.oecd-ilibrary.org/statistics


Original source: Jotham Lim, Listicle: Open Data Platforms Every Malaysian Data Scientist Should Know (Digital Edge, The Edge Markets, 2021).