In Australia, the Peter MacCallum Cancer Centre and the John Holland Group, an infrastructure and construction firm, have successfully addressed significant data fragmentation problems by leveraging the cloud data and AI platform Databricks. This solution has enabled them to draw valuable insights from their business data.
At Databricks’ Data + AI World Tour in Sydney, Australia, last month, tech leaders from both organizations shared their unique data challenges. These included siloed data, competing business areas, data integration issues, and legacy systems, all of which necessitated a cloud data solution.
Peter MacCallum Cancer Centre consolidates data to use AI.
Peter Mac’s legacy data infrastructure limited its ability to effectively leverage big data and AI across its extensive clinical and research operations. The legacy technology also jeopardized its mission to improve the lives of people with cancer, including using AI to improve clinical decision-making and accelerate biological insights and drug discovery.
Problems with data infrastructure
During the conference, Jason Li, head of the bioinformatics core facility in Peter Mac’s cancer research division, said:
- Peter Mac was dealing with various siloed data and legacy systems.
- The complexity and volume of clinical and research data across the cancer center’s operations posed data storage and analytics challenges.
- Ethical, privacy, and safety concerns were key factors governing Peter Mac’s data and deploying future AI use cases.
- Integration between clinical and research departments complicated the data governance challenge because each had different data requirements.
Li said Peter Mac selected Databricks to help it harmonize data across the center, support advanced analytics, including AI, and meet healthcare data security and privacy requirements.
Expanding into new AI use cases
Peter Mac first tested the AI potential of the Databricks platform with an AI transformation pilot project:
- The center created an end-to-end AI lifecycle, which involved applying deep learning to analyze gigapixel whole-slide images to quantify a new biomarker for breast cancer prognosis.
- Databricks supported the AI lifecycle — from initial data ingestion to model deployment and monitoring — what Li said made the project time and cost-efficient;
- The results of the project could have “great promise” for enhancing breast cancer prognosis.
Li highlighted the significant advantage of speed across the project: “We estimate that with Databricks, we have sped up the development process fivefold and reduced communication overheads across stakeholders by tenfold, allowing us to bring innovations to the market earlier to benefit patients.” This impressive acceleration is a testament to the platform’s capabilities.
AI strategy now includes future projects.
AI has grown into a more significant part of Peter Mac’s strategy. Databricks supports the cancer center in three additional use cases: genomics, radiation oncology, and cancer imaging. Additionally, Peter Mac is:
- It is extended to include bioinformatics, which provides for population genetics projects that involve large sample sizes and large amounts of genomic data.
- I am applying advances in large language models and Retrieval-Augmented Generation to extract knowledge from clinical and radiology reports.
- We plan to implement LLMs for genomics and transcriptomics research, which analyses RNA or the transcriptome to remain competitive in cancer research.
More Australia coverage
John Holland aims to unify data across construction operations.
Meanwhile, John Holland managed 80 large-scale infrastructure projects worth AUD 13.2 billion in 2023. However, Travis Rousell, the company’s head of data and analytics, said its legacy data warehouse environment was fragmented and complex to integrate.
SEE: How to improve data quality in data lakes
“We’ve got all the typical problems everybody’s had historically with data warehouses and data problems,” Rousell said. “Our legacy data warehouse environment was built incrementally over 20 years. It’s slowly evolved and developed out, and we’ve created this swampy set of data silos.”
Rousell added, “We could build BI [Business Intelligence] and reports on top of those, but joining that data together to create insights into the flow of activities and behaviors that are occurring so that we can drive change across our business has been a complicated process for us.”
A unified data platform to deliver valuable insights
John Holland set out to create a unified data platform to unlock data for business value. This was part of the group’s effort to drive innovation and competitive advantage in its industry through modern data and digital practices as part of a broader digital transformation push.
The organization has sought to:
- Provide a unified and integrated view of data across the business.
- Manage governance of data across separately managed projects.
- Achieve a focus on data engineering rather than platform engineering.
Cost savings come from better data management.
So far, John Holland has delivered several core business processes to Databricks’ data lake, including project management, project operations, project controls, safety, and fleet analytics.
As a result of using Databricks, Rousell said that John Holland had:
- Reduced platform infrastructure costs by 46% on like-for-like workflows compared with legacy environments;
- Rebuilding out new data products and models reduces data engineering development effort and time by 30%. Migrated over 600 users to data products provisioned through the Databricks data lakehouse.
It is becoming an enabler for John Holland’s business.
Rousell said Databricks ensures IT and technology do not constrain the business from progressing.
“I think the biggest thing we’re achieving by doing this is creating this data culture of ‘yes’ within John Holland,” Rousell explained. “Historically, the difficulty in provisioning new and innovative products has meant we’ve had to stand up large, slow projects and underdeliver for the business.
“Now, if the business has an idea, we can say yes; we can deploy them a data workspace that gives them access to all the capability and tooling they’ll need, and they can go and build that at speed.”