Automating Microsoft Fabric & Power BI Governance

A primary issue we consistently observe in business intelligence environments—especially as enterprises migrate from Power BI Premium per Capacity to Azure-backed Microsoft Fabric Capacities (F-SKUs)—is that costs can spiral entirely out of control if you leave developers without strict deployment and query guardrails.

Microsoft Fabric operates on a Capacity Unit (CU) model. Every second of DAX evaluation, every excessive iterator (FILTER, SUMX inside huge nested contexts), and every unoptimized measure burns capacity. If your models aren’t properly managed, your capacity gets throttled. The common instinct is to buy more capacity. The correct instinct is to implement programmatic governance.

The Problem With Manual Governance

Historically, ensuring a semantic model was healthy meant firing up Tabular Editor (2 or 3), running the Best Practice Analyzer (BPA) macros locally, and doing ad-hoc investigations. It simply doesn’t scale.

When you manage dozens of workspaces spanning multiple production environments, hoping that individual developers remember to hide foreign keys and swap FILTER(Table) for boolean predicates isn’t an architectural strategy; it’s a prayer.

The Real Cost of Neglect

Here’s what uncontrolled semantic models look like in production:

Capacity throttling: A single poorly written DAX measure iterating over a 50M row fact table can consume 30-40% of an F4 capacity during peak hours
Cascading slowdowns: When one workspace’s refresh triggers throttling, every workspace on that capacity suffers—finance, operations, sales, all frozen
Emergency upsizing: Panic-buying F8 or F16 to compensate for poor modeling easily adds $2,000-8,000/month in unnecessary spend
Shadow IT: Frustrated users export to Excel and build their own analysis, undermining the entire BI investment

The gap between “we have Power BI” and “we have governed Power BI” is typically 5-10x in operational cost.

Our Approach: The Automated Catalog

At Montinegro Corp, we shifted governance from a reactive, manual step into a proactive, automated pipeline relying heavily on Python automation. This system actively guards our, and our clients’, Fabric tenants.

We maintain an internal catalog of specialized Python scripts—each targeting a specific governance domain—that run on schedule or on-demand against the Power BI backend.

1. Extracting Intelligence via the Power BI REST API

We maintain an evolving internal catalog of Python scripts directly built to interact with the Power BI Admin REST APIs (specifically endpoints like WorkspaceInfo and Scanner).

Rather than manually checking workspaces, we execute scheduled jobs that parse the entire topology of the Power BI tenant into memory. This provides deep structural insight without needing to explicitly query the models.

A typical scan produces:

Workspace: PRD_COMERCIAL
  ├─ Dataset: DSN_VENDAS_REGIONAL (45 tables, 23 measures)
  │   ├─ Last refresh: 2026-04-24 06:15:00 UTC (Success)
  │   ├─ Configured memory: 2.4 GB
  │   └─ Endorsement: Certified
  ├─ Dataset: DSN_META_FORECAST (12 tables, 8 measures)
  │   ├─ Last refresh: 2026-04-24 06:30:00 UTC (Failed)
  │   └─ Error: Timeout on partition MART_FORECAST_DETAIL
  └─ Reports: 14 (3 orphaned - no dataset binding)

This data feeds into automated alerts. Failed refreshes trigger immediate investigation. Orphaned reports get flagged for cleanup.

2. Live DAX Queries via Python (ExecuteQueries Endpoint)

By securing implicit connections using Entra ID (via az cli tokens within sandboxed environments), we trigger custom DAX expressions directly from our code, hitting the datasets/{datasetId}/executeQueries endpoints. This means we can periodically check standard query speeds across live multi-million row datasets to measure potential degradation before a user complains.

We run benchmark DAX queries on a schedule:

benchmark_queries = {
    "simple_aggregation": "EVALUATE ROW(\"Total\", [Total Sales])",
    "time_intelligence": "EVALUATE ROW(\"YTD\", [Sales YTD])",
    "complex_filter": """
        EVALUATE
        CALCULATETABLE(
            SUMMARIZECOLUMNS(
                'Product'[Category],
                "Revenue", [Total Sales],
                "Margin", [Gross Margin %]
            ),
            'Calendar'[Year] = 2026
        )
    """
}

Each query is timed. If simple_aggregation that normally completes in 120ms suddenly takes 800ms, we know something changed in the model or the capacity is under pressure. This catches performance regressions before users start filing tickets.

3. Automated DMV Scanning (Dynamic Management Views)

Governance is more than speed—it’s hygiene. Our orchestration pings the underlying Analysis Services engines via DMVs to detect massive red flags:

Are there unhidden dimensions serving as primary keys that break interface logic?
Are data types mismatched causing DAX engine memory bloats?
Are excessive bi-directional cross-filtering relationships in place, breaking standard star-schema logic?
Are there unused columns consuming memory without appearing in any measure or visual?

Our Python stack automatically parses these XMLA or DMV readouts, categorizes the offenses by severity, and generates structured reports.

Severity Classification

Level	Issue	Example	Impact
Critical	Bi-directional relationships	Fact ↔ Dimension with ambiguity	Security filter leaks, wrong calculations
High	Unhidden FK columns	CustomerID visible in field list	User confusion, accidental aggregation
Medium	Missing sort-by-column	Month name without sort order	Incorrect visual ordering
Low	Unused columns	Imported columns in no measure/visual	Wasted memory, slower refreshes

Reports are generated as Markdown tables that feed directly into our monitoring dashboards and Slack alerts.

4. Integration with CI/CD

This is where the governance system reaches its full potential. By pushing .pbip (Power BI Project) files into GitHub, we attach these exact Python scanning jobs as GitHub Actions.

The workflow:

Developer creates or modifies a semantic model locally
They commit the .pbip project file to a feature branch
A Pull Request triggers automated CI checks:
- Schema validation: Are all tables following naming conventions?
- BPA rules: Does the model pass all Best Practice Analyzer checks?
- Relationship audit: Any bi-directional or many-to-many without justification?
- Measure quality: Are there FILTER(Table) patterns that should be predicates?
If any check fails, the PR is blocked with detailed feedback
If all checks pass, the model is approved for merge to main
Post-merge, the deployment pipeline publishes the model to the target workspace

This means governance enforcement happens before a bad model ever reaches production. Developers get instant, specific feedback: “Line 47 of measures.dax: Replace FILTER(Sales, Sales[Region] = "East") with Sales[Region] = "East" as a direct predicate for 3-5x performance improvement.”

5. Capacity Monitoring and Alerting

Beyond model-level governance, we continuously monitor capacity utilization patterns:

CU consumption trends: Are we approaching the 24-hour smoothing threshold?
Background vs. interactive split: Are scheduled refreshes competing with user queries?
Throttling events: When did throttling start, which operations triggered it, and what was the blast radius?

We pull this data from the Fabric Capacity Metrics app and augment it with our own telemetry. The goal is never to be surprised by a capacity issue—by the time a user notices slowness, we should already know why and be working on the fix.

The ROI Framework

For a typical mid-market enterprise running 20-50 workspaces on an F4 capacity ($5,000/month):

Before automated governance:

2-3 throttling events per week
Average 45 minutes of degraded performance per event
1 emergency F8 upgrade per quarter ($10,000/month temporary spend)
8-12 hours/month of manual BPA reviews by senior developers

After automated governance:

0-1 throttling events per month
Issues caught at PR stage, not production
Stable F4 capacity with no emergency upgrades
0 hours of manual reviews (fully automated)

Estimated annual savings: $40,000-80,000 in capacity costs alone, plus 100-150 hours of reclaimed developer time.

What We Don’t Automate

Not everything should be code. Naming conventions and architectural standards require human judgment during initial design. Data access policies need business stakeholder input. The decision to promote a dataset from Development to Certified status involves trust and validation that no script can fully replace.

Our automation handles the mechanical enforcement. Humans handle the strategic decisions. The line between the two should be deliberate, not accidental.

Implementation

Throwing cash at an oversized Fabric F-SKU to hide terrible data modeling is the fastest path to unprofitability. By injecting code-first infrastructure—treating Power BI semantic models merely as structural endpoints to be interrogated by Python automation—you regain total control over your cloud spending and semantic health.

The tools exist. The APIs are documented. The question is whether your organization treats governance as an afterthought or as infrastructure.