Enabling high throughput in widely distributed data management and analysis systems: Lessons from the LHC

Today's large-scale science projects all involve world-wide collaborations that routinely move 10s of petabytes per year between international sites. This is true for the two main experiments at the Large Hadron Collider (LHC) at CERN – ATLAS and CMS – and for the climate science community. In the near future, experiments like Belle-II at the KEK accelerator in Japan, the genome science community, the Square Kilometer Array radio telescope, and ITER, the international fusion energy experiment, will all involve comparable data movement in order to accomplish their science. The capabilities required to support this scale of data movement involve hardware and software developments at all levels: Fiber signal transport, layer 2 transport (e.g. Ethernet), data transport (TCP is still the norm), operating system evolution, data movement and management techniques and software, and increasing sophistication in the distributed science applications. Further, ESnet’s years of science requirements gathering indicates that these issues hold true across essentially all science disciplines that rely on the network for significant data transfer, even if the quantities are modest compared to project like the LHC experiments. This talk will provide some of the context and then discuss each of the topics that experience has shown are essential to enable large scale “data-intensive” science.



  • William Johnston (ESnet), Michael Ernst (Brookhaven National Lab), Eli Dart (ESnet), and Brian Tierney(ESnet)

Part of session

Big Data, Big Deal

Related documents