Post Merger IT Integration through Next-Generation Data Engineering Solutions

Abstract ​

By leveraging Microsoft Fabric’s OneLake, SELISE is effectively mitigating clients’ data management challenges stemming from scattered, duplicate, and redundant data factoring in a significantly large volume of data and diverse data sources. As a result, clients experience gains in operational efficiency, surpassing traditional data warehousing exercises multifold. This article highlights the journey of SELISE in elevating operational excellence in an exemplary post-merger system integration scenario.

Introduction

When SELISE was addressing the data-oriented challenges faced by a prospective client, Microsoft Fabric emerged as a premier solution comprising high scalability and cost-effectiveness. Robust support for tried and tested data engineering frameworks and AI capabilities stood out from the onset.

7 Reasons why SELISE chose Microsoft Fabric

  1. Strategic Problem-Solving: SELISE’s data experts leveraged Microsoft Fabric for invaluable insights and strategic problem-solving. The tool’s lean design and intuitive interface, along with features like out-of-the-box lake house creation, Power BI support, and compatibility with established data engineering languages, allowed SELISE to focus on strategy without extensive technical analysis.

  2. Consistent Support: SELISE’s global operations and 24/7 incident management service desk require solutions with strong technical support and quick incident response functionalities, qualities found in Microsoft Fabric. As an early adopter of such bespoke tools, it was imperative for SELISE, a Microsoft Gold Partner, to get the backing of the product teams at Microsoft.

  3. Near Zero Downtime: SELISE requires SaaS solutions with minimal downtimes due to the critical reliance of their B2B and B2B2C clients on their business solutions. Microsoft Fabric’s promise of 99.99% uptime builds significant confidence, especially since SELISE already utilizes the dependable Microsoft Azure platform for its microservices cloud infrastructure and has numerous applications running on the Azure Kubernetes platform.

  4. Scalability: Microsoft Fabric’s application availability zones ensure high availability and performance by managing data in scalable, distributed data centers. SELISE saw the benefit in this by starting with smaller experimental deployments using an innovative SKU-based hardware capability model and scaling up to higher SKU tiers for production-grade deployment as needed.

  5. Efficient Data Management: Since Microsoft Fabric is an evolution of Azure Synapse, that brought in a unified approach to managing data sources, data pipelines, and data science utilizing already familiar tools like Azure Data Factory, Apache Spark, SQL, and Power BI, SELISE gained operational efficiency with minimal learning curve in data management.

  6. Cost Effectiveness: Microsoft Fabric’s 60-day trial period and no upfront investment made it an attractive choice for SELISE as a data engineering platform. Additionally, a cost feasibility analysis revealed that Fabric’s compute unit per second billing model is more effective compared to other providers’ pay-as-you-go models that require initial investments.

  7. Developer Friendliness: SELISE’s developers benefited from Microsoft Fabric’s extensive and helpful documentation, which is consistent with Microsoft’s reputation for providing comprehensive support materials for all its products and programming tools. This documentation significantly enhanced developer efficiency.

Post Merger IT Integration

This generic case study shows how SELISE has helped large organizations accelerate their IT integration work after acquiring new entities.

Background

Two major airfreight companies merged, creating a conglomerate with significant strategic synergy but also substantial IT challenges. The merger resulted in disorganized, redundant, and duplicate data sources, as the entities used different applications for their operations. This lack of data traceability led to complex reporting, degraded operational performance, declining customer satisfaction due to imprecise data origins, and increased human effort required to sift through data clutter, ultimately impacting revenue negatively.

Challenge

The main challenge in the post-merger phase was to effectively manage and consolidate the data sets mentioned from two distinct entities into a single, coherent, and easily accessible system. The redundancy, conflict, and disorganization in the merged data required a flexible approach to processing, cleaning, and deduplicating. Successfully curating and aggregating this data was essential to establish a unified source of truth, which is crucial for making informed decisions and streamlining operations.

Solution

To address their data integration challenges, SELISE began with a pre-study phase to identify the pain points across various teams within the conglomerate and gain a comprehensive understanding of the data ecosystem’s complexity. This involved analyzing multiple business domains and evaluating specific reporting use cases, with an eye toward incorporating machine learning capabilities into the new data platform. A significant hurdle was performing Source to Target Mapping (STTM) due to the wide variety of applications used for both internal operations and external customer portals. To resolve this, SELISE decided to implement a unified data lake capable of ingesting data from diverse databases, performing multi-level cleaning, and executing relational mapping to create a trackable and consolidated data resource.

SELISE’s strategy unfolded in multiple phases, adopting the widely used Medallion Architecture to tackle the data engineering challenges at hand.

  • Bronze Layer Data Ingestion

    SELISE was tasked with managing large datasets that included geographic, financial, human resources, and customer information, and data from IoT devices. We learned that not all data were in SQL Server Database format; some were unstructured streaming data, some semi-structured from SharePoint, and others in standard SQL database format. Also, connecting to on-premises SQL Server databases first posed a significant obstacle. Fortunately, we found that Microsoft Fabric supports various types of data ingestion through Azure Hub, Azure Data Factory, and the On-Premises Data Gateway for on-premises data. This capability significantly simplified the ingestion of data from diverse cloud and on-prem sources into the bronze layer, addressing key challenges in our data management process.

  • Silver Layer Data Cleansing and Deduplication

    Working with the accumulated data from the bronze layer posed a different degree of challenge since many data required normalization. This step involved transforming the data into an unbiased layer of data that can be easily used to create aggregated versions of the data in the silver layer. Due to Microsoft Fabric’s top-notch support of SQL with all its capabilities we were able to write custom stored procedures and queries that let us transform the data and leverage Microsoft Scheduling Agents and ETL processes provided by Azure Data Factory. Not only were we able to use our existing SQL expertise flawlessly, but also the simple and intuitive workspaces and allowed us to orchestrate collaboration between multiple teams.
Different approaches of ETL applied depending on the nature of the data
  • Gold Layer Data Aggregation

    The end goal was to produce business-ready data that allowed for dynamic reporting via Power BI and allows for 3rd party applications to communicate with the Fabric ecosystem without much friction. Thanks to Microsoft Fabric’s direct lake mode, we developed logically separated SQL data warehouses and exposed read-only data by utilizing secure SQL Analytics Endpoints.

  • Data Propagation

    In this case study, the Slowly Changing Dimension (SCD) Type 2 method was used in data warehousing to manage and monitor data changes over time, effectively adding new records with updated versions for data changes. This approach facilitated the aggregation of data into organized, business-domain aligned datasets, enhancing on-demand access for internal reporting teams and enabling efficient Power Query-based Power BI reporting. Additionally, Dataflow Gen2 was instrumental in ensuring robust, scalable, and efficient data movement and transformation across various stages of the data pipeline (bronze, silver, gold layers), supporting continuous updates and transformations to maintain smooth data flow.

  • Analytics Enablement

    Upon aggregation and consolidation of data within the gold lake, the next challenge was to build a semantics layer that allowed data science incorporation and highly versatile analytics. We leveraged Apache Spark, which is readily available within Microsoft Fabric, to deploy machine learning models and gain insightful analytics that are otherwise unachievable with human effort. Notebooks, an excellent feature of Microsoft Fabric allowed us to utilize our full expertise of python that allowed us to perform machine learning experiments with Apache Spark Jobs.
A high-level overview of how SELISE designed the data engineering solution in medallion architecture

The Outcome

The new conglomerate was able to completely transform the way they managed their data ecosystem. Establishing a sole source of truth improved the company’s analytical capabilities, which specifically accelerated decision-making processes and improved operational efficiency. This data transformation not only directly addressed initial challenges but also gave innovative capabilities to the merged entity. Consequently, these activities positioned the new conglomerate on a path toward sustained business growth.

Conclusion and Outlook

SELISE is strategically integrating Microsoft Fabric into its operations to stay ahead of emerging market trends and harness the full potential of business data which is the foundation of any AI-powered system. This commitment involves enhancing IoT data integration capabilities and leveraging machine learning to convert vast data sets into actionable business intelligence. By focusing on real-time data analysis, SELISE aims to develop solutions that provide timely insights across various business operations, enhancing decision-making and operational efficiency.

The company, through its Academy, is also dedicated to enriching its training programs in key areas such as Data Engineering and Data Science, ensuring its workforce is well-equipped to handle the demands of modern data solutions. Looking forward, SELISE’s continued partnership with Microsoft Fabric is set to drive digital transformation, positioning the company as a leader in technological innovation and providing substantial value to its clients. This proactive approach ensures that SELISE not only meets but exceeds client expectations in a rapidly evolving technological landscape.