This put up was written with Bryan Woolgar-O’Neil, Jamie Cockrill and Adrian Cunliffe from Harmonic Safety
Organizations face rising challenges defending delicate information whereas supporting third-party generative AI instruments. Harmonic Safety, a cybersecurity firm, developed an AI governance and management layer that spots delicate information in line as staff use AI, giving safety groups the facility to maintain PII, supply code, and payroll info secure whereas the enterprise accelerates.
The next screenshot demonstrates Harmonic Safety’s software program instrument, highlighting the totally different information leakage detection varieties, together with Worker PII, Worker Monetary Info, and Supply Code.
Harmonic Safety’s resolution can also be now out there on AWS Market, enabling organizations to deploy enterprise-grade information leakage safety with seamless AWS integration. The platform supplies prompt-level visibility into GenAI utilization, real-time teaching on the level of threat, and detection of high-risk AI purposes—all powered by the optimized fashions described on this put up.
The preliminary model of their system was efficient, however with a detection latency of 1–2 seconds, there was a possibility to additional improve its capabilities and enhance the general person expertise. To attain this, Harmonic Safety partnered with the AWS Generative AI Innovation Middle to optimize their system with 4 key aims:
- Scale back detection latency to beneath 500 milliseconds on the ninety fifth percentile
- Keep detection accuracy throughout monitored information varieties
- Proceed to assist EU information residency compliance
- Allow scalable structure for manufacturing hundreds
This put up walks via how Harmonic Safety used Amazon SageMaker AI, Amazon Bedrock, and Amazon Nova Professional to fine-tune a ModernBERT mannequin, reaching low-latency, correct, and scalable information leakage detection.
Resolution overview
Harmonic Safety’s preliminary information leakage detection system relied on an 8 billion (8B) parameter mannequin, which successfully recognized delicate information however incurred 1–2 second latency, which ran near the brink of impacting person expertise. To attain sub-500 millisecond latency whereas sustaining accuracy, we developed two classification approaches utilizing a fine-tuned ModernBERT mannequin.
First, a binary classification mannequin was prioritized to detect Mergers & Acquisitions (M&A) content material, a crucial class for serving to forestall delicate information leaks. We initially centered on binary classification as a result of it was the only method that may seamlessly combine inside their present system that invokes a number of binary classification fashions in parallel. Secondly, as an extension, we explored a multi-label classification mannequin to detect a number of delicate information varieties (similar to billing info, monetary projections, and employment information) in a single go, aiming to scale back the computational overhead of operating a number of parallel binary classifiers for higher effectivity. Though the multi-label method confirmed promise for future scalability, Harmonic Safety determined to stay with the binary classification mannequin for the preliminary model.The answer makes use of the next key providers:
The next diagram illustrates the answer structure for low-latency inference and scalability.
The structure consists of the next elements:
- Mannequin artifacts are saved in Amazon Easy Storage Service (Amazon S3)
- A customized container with inference code is hosted in Amazon Elastic Container Registry (Amazon ECR)
- A SageMaker endpoint makes use of ml.g5.4xlarge cases for GPU-accelerated inference
- Amazon CloudWatch screens invocations, triggering auto scaling to regulate cases (1–5) based mostly on an 830 requests per minute (RPM) threshold.
The answer helps the next options:
- Sub-500 milliseconds inference latency
- EU AWS Area deployment assist
- Automated scaling between 1–5 cases based mostly on demand
- Price optimization throughout low-usage intervals
Artificial information era
Excessive-quality coaching information for delicate info (similar to M&A paperwork and monetary information) is scarce. We used Meta Llama 3.3 70B Instruct and Amazon Nova Professional to generate artificial information, increasing upon Harmonic’s present dataset that included examples of information within the following classes: M&A, billing info, monetary projection, employment information, gross sales pipeline, and funding portfolio. The next diagram supplies a high-level overview of the artificial information era course of.
Knowledge era framework
The artificial information era framework is comprised of a collection of steps, together with:
- Sensible instance choice – Okay-means clustering on sentence embeddings helps numerous instance choice
- Adaptive prompts – Prompts incorporate area information, with temperature (0.7–0.85) and top-p sampling adjusted per class
- Close to-miss augmentation – Unfavorable examples resembling constructive circumstances to enhance precision
- Validation – An LLM-as-a-judge method utilizing Amazon Nova Professional and Meta Llama 3 validates examples for relevance and high quality
Binary classification
For the binary M&A classification job, we generated three distinct varieties of examples:
- Optimistic examples – These contained express M&A info whereas sustaining real looking doc buildings and finance-specific language patterns. They included key indicators like “merger,” “acquisition,” “deal phrases,” and “synergy estimates.”
- Unfavorable examples – We created domain-relevant content material that intentionally averted M&A traits whereas remaining contextually applicable for enterprise communications.
- Close to-miss examples – These resembled constructive examples however fell simply outdoors the classification boundary. As an example, paperwork discussing strategic partnerships or joint ventures that didn’t represent precise M&A exercise.
The era course of maintained cautious proportions between these instance varieties, with specific emphasis on near-miss examples to handle precision necessities.
Multi-label classification
For the extra advanced multi-label classification job throughout 4 delicate info classes, we developed a complicated era technique:
- Single-label examples – We generated examples containing info related to precisely one class to ascertain clear category-specific options
- Multi-label examples – We created examples spanning a number of classes with managed distributions, protecting numerous combos (2–4 labels)
- Class-specific necessities – For every class, we outlined necessary components to keep up express moderately than implied associations:
- Monetary projections – Ahead-looking income and development information
- Funding portfolio – Particulars about holdings and efficiency metrics
- Billing and fee info – Invoices and provider accounts
- Gross sales pipeline – Alternatives and projected income
Our multi-label era prioritized real looking co-occurrence patterns between classes whereas sustaining enough illustration of particular person classes and their combos. In consequence, artificial information elevated coaching examples by 10 instances (binary) and 15 instances (multi-label) extra. It additionally improved the category stability as a result of we made positive to generate the info with a extra balanced label distribution.
Mannequin fine-tuning
We fine-tuned ModernBERT fashions on SageMaker to attain low latency and excessive accuracy. In contrast with decoder-only fashions similar to Meta Llama 3.2 3B and Google Gemma 2 2B, ModernBERT’s compact measurement (149M and 395M parameters) translated into quicker latency whereas nonetheless delivering increased accuracy. We due to this fact chosen ModernBERT over fine-tuning these options. As well as, ModernBERT is likely one of the few BERT-based fashions that helps context lengths of as much as 8,192 tokens, which was a key requirement for our challenge.
Binary classification mannequin
Our first fine-tuned mannequin used ModernBERT-base, and we centered on binary classification of M&A content material.We approached this job methodically:
- Knowledge preparation – We enriched our M&A dataset with the synthetically generated information
- Framework choice – We used the Hugging Face transformers library with the Coach API in a PyTorch surroundings, operating on SageMaker
- Coaching course of – Our course of included:
- Stratified sampling to keep up label distribution throughout coaching and analysis units
- Specialised tokenization with sequence lengths as much as 3,000 tokens to match what the consumer had in manufacturing
- Binary cross-entropy loss optimization
- Early stopping based mostly on F1 rating to forestall overfitting.
The consequence was a fine-tuned mannequin that would distinguish M&A content material from non-sensitive info with a better F1 rating than the 8B parameter mannequin.
Multi-label classification mannequin
For our second mannequin, we tackled the extra advanced problem of multi-label classification (detecting a number of delicate information varieties concurrently inside single textual content passages).We fine-tuned a ModernBERT-large mannequin to determine numerous delicate information varieties like billing info, employment information, and monetary projections in a single go. This required:
- Multi-hot label encoding – We transformed our classes into vector format for simultaneous prediction.
- Focal loss implementation – As a substitute of normal cross-entropy loss, we carried out a customized FocalLossTrainer class. Not like static weighted loss features, Focal Loss adaptively down-weights easy examples throughout coaching. This helps the mannequin focus on difficult circumstances, considerably bettering efficiency for much less frequent or harder-to-detect lessons.
- Specialised configuration – We added configurable class thresholds (for instance, 0.1 to 0.8) for every class chance to find out label task as we noticed various efficiency in several resolution boundaries.
This method enabled our system to determine a number of delicate information varieties in a single inference go.
Hyperparameter optimization
To search out the optimum configuration for our fashions, we used Optuna to optimize key parameters. Optuna is an open-source hyperparameter optimization (HPO) framework that helps discover the perfect hyperparameters for a given machine studying (ML) mannequin by operating many experiments (referred to as trials). It makes use of a Bayesian algorithm referred to as Tree-structured Parzen Estimator (TPE) to decide on promising hyperparameter combos based mostly on previous outcomes.
The search area explored quite a few combos of key hyperparameters, as listed within the following desk.
| Hyperparameter | Vary |
| Studying price | 5e-6–5e-5 |
| Weight decay | 0.01–0.5 |
| Warmup ratio | 0.0–0.2 |
| Dropout charges | 0.1–0.5 |
| Batch measurement | 16, 24, 32 |
| Gradient accumulation steps | 1, 4 |
| Focal loss gamma (multi-label solely) | 1.0–3.0 |
| Class threshold (multi-label solely) | 0.1–0.8 |
To optimize computational sources, we carried out pruning logic to cease under-performing trials early, so we may discard configurations that have been much less optimum. As seen within the following Optuna HPO historical past plot, trial 42 had probably the most optimum parameters with the best F1 rating for the binary classification, whereas trial 32 was probably the most optimum for the multi-label.
Furthermore, our evaluation confirmed that dropout and studying price have been an important hyperparameters, accounting for 48% and 21% of the variance of the F1 rating for the binary classification mannequin. This defined why we seen the mannequin overfitting rapidly throughout earlier runs and stresses the significance of regularization.
After the optimization experiments, we found the next:
- We have been capable of determine the optimum hyperparameters for every job
- The fashions converged quicker throughout coaching
- The ultimate efficiency metrics confirmed measurable enhancements over configurations we examined manually
This allowed our fashions to attain a excessive F1 rating effectively by operating hyperparameter tuning in an automatic style, which is essential for manufacturing deployment.
Load testing and autoscaling coverage
After fine-tuning and deploying the optimized mannequin to a SageMaker real-time endpoint, we carried out load testing to validate the efficiency and autoscaling beneath strain to satisfy Harmonic Safety’s latency, throughput, and elasticity wants. The aims of the load testing have been:
- Validate latency SLA with a median of lower than 500 milliseconds and P95 of roughly 1 second various hundreds
- Decide throughput capability with most RPM utilizing ml.g5.4xlarge cases inside latency SLA
- Inform the auto scaling coverage design
The methodology concerned the next:
- Visitors simulation – Locust simulated concurrent person visitors with various textual content lengths (50–9,999 characters)
- Load sample – We stepped ramp-up assessments (60–2,000 RPM, 60 seconds every) and recognized bottlenecks and stress-tested limits
As proven within the following graph, we discovered that the utmost throughput beneath a latency of 1 second was 1,185 RPM, so we determined to set the auto scaling threshold to 70% of that at 830 RPM.
Primarily based on the efficiency noticed throughout load testing, we configured a target-tracking auto scaling coverage for the SageMaker endpoint utilizing Utility Auto Scaling. The next determine illustrates this coverage workflow.
The important thing parameters outlined have been:
- Metric –
SageMakerVariantInvocationsPerInstance(830 invocations/occasion/minute) - Min/Max Cases – 1–5
- Cooldown – Scale-out 300 seconds, scale-in 600 seconds
This target-tracking coverage adjusts cases based mostly on visitors, sustaining efficiency and cost-efficiency. The next desk summarizes our findings.
| Mannequin | Requests per Minute |
|---|---|
| 8B mannequin | 800 |
| ModernBERT with auto scaling (5 cases) | 1,185-5925 |
| Extra capability (ModernBERT vs. 8B mannequin) | 48%-640% |
Outcomes
This part showcases the numerous influence of the fine-tuning and optimization efforts on Harmonic Safety’s information leakage detection system, with a main concentrate on reaching substantial latency reductions. Absolute latency enhancements are detailed first, underscoring the success in assembly the sub-500 millisecond goal, adopted by an outline of efficiency enhancements. The next subsections present detailed outcomes for binary M&A classification and multi-label classification throughout a number of delicate information varieties.
Binary classification
We evaluated the fine-tuned ModernBERT-base mannequin for binary M&A classification in opposition to the baseline 8B mannequin, launched within the resolution overview. Probably the most hanging achievement was a transformative discount in latency, addressing the preliminary 1–2 second delay that risked disrupting person expertise. This leap to sub-500 millisecond latency is detailed within the following desk, marking a pivotal enhancement in system responsiveness.
| Mannequin | median_ms | p95_ms | p99_ms | p100_ms |
|---|---|---|---|---|
| Modernbert-base-v2 | 46.03 | 81.19 | 102.37 | 183.11 |
| 8B mannequin | 189.15 | 259.99 | 286.63 | 346.36 |
| Distinction | -75.66% | -68.77% | -64.28% | -47.13% |
Constructing on this latency breakthrough, the next efficiency metrics replicate share enhancements in accuracy and F1 rating.
| Mannequin | Accuracy Enchancment | F1 Enchancment |
| ModernBERT-base-v2 | +1.56% | +2.26% |
| 8B mannequin | – | – |
These outcomes spotlight that ModernBERT-base-v2 delivers a groundbreaking latency discount, complemented by modest accuracy and F1 enhancements of 1.56% and a couple of.26%, respectively, aligning with Harmonic Safety’s aims to boost information leakage detection with out impacting person expertise.
Multi-label classification
We evaluated the fine-tuned ModernBERT-large mannequin for multi-label classification in opposition to the baseline 8B mannequin, with latency discount because the cornerstone of this method. Probably the most vital development was a considerable lower in latency throughout all evaluated classes, reaching sub-500 millisecond responsiveness and addressing the earlier 1–2 second bottleneck. The latency outcomes proven within the following desk underscore this crucial enchancment.
| Dataset | mannequin | median_ms | p95_ms | p99_ms |
| Billing and fee | 8B mannequin | 198 | 238 | 321 |
| ModernBERT-large | 158 | 199 | 246 | |
| Distinction | -20.13% | -16.62% | -23.60% | |
| Gross sales pipeline | 8B mannequin | 194 | 265 | 341 |
| ModernBERT-large | 162 | 243 | 293 | |
| Distinction | -16.63% | -8.31% | -13.97% | |
| Monetary projections | 8B mannequin | 384 | 510 | 556 |
| ModernBERT-large | 160 | 275 | 310 | |
| Distinction | -58.24% | -46.04% | -44.19% | |
| Funding portfolio | 8B mannequin | 397 | 498 | 703 |
| ModernBERT-large | 160 | 259 | 292 | |
| Distinction | -59.69% | -47.86% | -58.46% |
This method additionally delivered a second key profit: a discount in computational parallelism by consolidating a number of classifications right into a single go. Nonetheless, the multi-label mannequin encountered challenges in sustaining constant accuracy throughout all lessons. Though classes like Monetary Projections and Funding Portfolio confirmed promising accuracy beneficial properties, others similar to Billing and Cost and Gross sales Pipeline skilled vital accuracy declines. This means that, regardless of its latency and parallelism benefits, the method requires additional growth to keep up dependable accuracy throughout information varieties.
Conclusion
On this put up, we explored how Harmonic Safety collaborated with the AWS Generative AI Innovation Middle to optimize their information leakage detection system reaching transformative outcomes:
Key efficiency enhancements:
- Latency discount: From 1–2 seconds to beneath 500 milliseconds (76% discount at median)
- Throughput enhance: 48%–640% extra capability with auto scaling
- Accuracy beneficial properties: +1.56% for binary classification, with maintained precision throughout classes
By utilizing SageMaker, Amazon Bedrock, and Amazon Nova Professional, Harmonic Safety fine-tuned ModernBERT fashions that ship sub-500 millisecond inference in manufacturing, assembly stringent efficiency targets whereas supporting EU compliance and establishing a scalable structure.
This partnership showcases how tailor-made AI options can sort out crucial cybersecurity challenges with out hindering productiveness. Harmonic Safety’s resolution is now out there on AWS Market, enabling organizations to undertake AI instruments safely whereas defending delicate information in actual time. Trying forward, these high-speed fashions have the potential so as to add additional controls for extra AI workflows.
To study extra, take into account the next subsequent steps:
- Strive Harmonic Safety – Deploy the answer immediately from AWS Market to guard your group’s GenAI utilization
- Discover AWS providers – Dive into SageMaker, Amazon Bedrock, and Amazon Nova Professional to construct superior AI-driven safety options. Go to the AWS Generative AI web page for sources and tutorials.
- Deep dive into fine-tuning – Discover the AWS Machine Studying Weblog for in-depth guides on fine-tuning LLMs for specialised use circumstances.
- Keep up to date – Subscribe to the AWS Podcast for weekly insights on AI improvements and sensible purposes.
- Join with specialists – Be part of the AWS Companion Community to collaborate with specialists and scale your AI initiatives.
- Attend AWS occasions – Register for AWS re: Invent. to discover cutting-edge AI developments and community with trade leaders.
By adopting these steps, organizations can harness AI-driven cybersecurity to keep up sturdy information safety and seamless person experiences throughout numerous workflows.
In regards to the authors
Babs Khalidson is a Deep Studying Architect on the AWS Generative AI Innovation Centre in London, the place he makes a speciality of fine-tuning giant language fashions, constructing AI brokers, and mannequin deployment options. He has over 6 years of expertise in synthetic intelligence and machine studying throughout finance and cloud computing, with experience spanning from analysis to manufacturing deployment.
Vushesh Babu Adhikari is a Knowledge scientist on the AWS Generative AI Innovation heart in London with in depth experience in creating Gen AI options throughout numerous industries. He has over 7 years of expertise spanning throughout a various set of industries together with Finance , Telecom , Info Know-how with specialised experience in Machine studying & Synthetic Intelligence.
Zainab Afolabi is a Senior Knowledge Scientist on the AWS Generative AI Innovation Centre in London, the place she leverages her in depth experience to develop transformative AI options throughout numerous industries. She has over 9 years of specialised expertise in synthetic intelligence and machine studying, in addition to a ardour for translating advanced technical ideas into sensible enterprise purposes.
Nuno Castro is a Sr. Utilized Science Supervisor on the AWS Generative AI Innovation Middle. He leads Generative AI buyer engagements, serving to AWS clients discover probably the most impactful use case from ideation, prototype via to manufacturing. He’s has 19 years expertise within the discipline in industries similar to finance, manufacturing, and journey, main ML groups for 11 years.
Christelle Xu is a Senior Generative AI Strategist who leads mannequin customization and optimization technique throughout EMEA throughout the AWS Generative AI Innovation Middle, working with clients to ship scalable Generative AI options, specializing in continued pre-training, fine-tuning, reinforcement studying, and coaching and inference optimization. She holds a Grasp’s diploma in Statistics from the College of Geneva and a Bachelor’s diploma from Brigham Younger College.
Manuel Gomez is a Options Architect at AWS supporting generative AI startups throughout the UK and Eire. He works with mannequin producers, fine-tuning platforms, and agentic AI purposes to design safe and scalable architectures. Earlier than AWS, he labored in startups and consulting, and he has a background in industrial applied sciences and IoT. He’s significantly thinking about how multi-modal AI may be utilized to actual trade issues.
Bryan Woolgar-O’Neil is the co-founder & CTO at Harmonic Safety. With over 20 years of software program growth expertise, the final 10 have been devoted to constructing the Menace Intelligence firm Digital Shadows, which was acquired by Reliaquest in 2022. His experience lies in creating merchandise based mostly on cutting-edge software program, specializing in making sense of huge volumes of information.
Jamie Cockrill is the Director of Machine Studying at Harmonic Safety, the place he leads a workforce centered on constructing, coaching, and refining Harmonic’s Small Language Fashions.
Adrian Cunliffe is a Senior Machine Studying Engineer at Harmonic Safety, the place he focuses on scaling Harmonic’s Machine Studying engine that powers Harmonic’s proprietary fashions.






