sdmx-core developer guide¶

The sdmx-core Java library provides a comprehensive framework for anything SDMX related; from reading and writing SDMX metadata or data, to building an enterprise SDMX application.

flowchart TD
  SDMX["<b>SDMX</b><br/>The standard for statistical data and metadata"]

  SDMX --> DATA["<b>Data</b><br/>Observed values — the facts"]
  SDMX --> STRUCT["<b>Structural Metadata</b><br/>Gives data meaning and structure"]
  SDMX --> REF["<b>Reference Metadata</b><br/>Methodology, quality, and context"]

A BIS Open Tech initiative¶

sdmx-core is owned by the Bank for International Settlements and published under the BIS Open Tech initiative.

The BIS Open Tech is a platform for sharing statistical and financial software as public goods, by promoting international cooperation and coordination. These software tools are developed according to international best practices and standards and can be reused and further developed in a wide variety of environments.

Origins¶

sdmx-core started life in 2007, at a time when there were no industry-strength libraries for working with SDMX content. The ambition was to provide a future-proof, industry-strength Java framework for SDMX software development tasks. This framework enables the rapid development of a diverse set of SDMX applications, without the concerns of data volume, or message syntax, or future changes to the SDMX specification.

The result of the 2007 project, led by Matthew Nelson at Metadata Technology, was a library called SdmxSource; this was released as an open-source library in 2010. Work continued on the library, driven by the requirements of the software the library was supporting; the result, almost 20 years on, is a mature, industry-hardened Java library. The library was rebranded to sdmx-core, as it lies at the core of the Fusion applications it was built to support.

Industry Strength¶

sdmx-core lies at the heart of the industry-strength Fusion Metadata Registry FMR, and the URN resolver service, it has been battle-tested in data collections, validations, structural metadata management, and used by commercial vendors for SDMX solutions.

Future Proof¶

sdmx-core does not couple an application to a specific version of SDMX; it achieves this by reading sdmx content into a version agnostic object model based on the SDMX Information Model (SDMX-IM). To ensure compatibility with previous versions of SDMX the framework can be asked to convert non-forward compatible SDMX model changes into the current SDMX-IM. When SDMX objects are serialised back into an SDMX syntax, the user is able to choose which version and format to write in. The version agnostic data and metadata model decouples the applications business logic from a specific version of the standard, and future proofs the application from change, whilst preserving the ability for the application to adopt new versions of the SDMX specification at minimal cost.

Optimised Memory Management¶

The ambition of sdmx-core was to ensure that large amounts of data could be read or written, without the need for large amounts of memory to be thrown at the application. sdmx-core streams data reads and data writes, keeping memory requirements to a minimum. For SDMX Structural metadata which includes Concepts, Codelists, and Data Structures; the deferred bean framework ensures that vast amounts of metadata can be passed around a system, with the content paged in and out of memory as it is required. These design principles combine to enable systems to work with large volumes of SDMX content, without requiring vast amounts of memory.

Optimised Performance¶

sdmx-core has been optimised for performance, to ensure fast read and write operations for both metadata and data.

On a commodity laptop (Macbook M3) a paged (off heap) read of Eurostat structural metadata, 2.5Gb JSON file, containing over 16k maintainable structures and just under 4 million items takes just over 1 minute. To write the content back out as an SDMX-ML (XML) structure file takes 16 seconds.

On the same laptop, fully reading a 1Gb dataset in SDMX-ML 3.0 containing 450k series and 10million observations takes 14 seconds. Writing the same dataset as SDMX-CSV takes 14 seconds.

Acknowledgements¶

The original SdmxSource framework was created by Matthew Nelson in 2007 at Metadata Technology, with the ambition of providing an industry-strength, future-proof SDMX development framework at a time when no such libraries existed.

From 2010 onwards, Phil Lazarou joined as co-developer, and together they led the architecture, design, implementation, and long-term evolution of the framework. Over almost two decades of continuous development, the library evolved from the original SdmxSource project into what is now known as sdmx-core.

The direction and maturity of the framework have been shaped by real-world operational requirements, large-scale SDMX implementations, evolving SDMX specifications, and practical experience gained through the development of enterprise SDMX solutions and infrastructure.

In recent years, sdmx-core has also benefited from contributions, feedback, testing, and collaboration from external organisations and the wider SDMX community. These contributions continue to help strengthen the framework and support its ongoing evolution.