What ML frameworks does Fabric support?

Fabric supports Scikit-learn, PyTorch, TensorFlow, XGBoost, LightGBM, and other Python-based ML libraries. You can install additional packages as needed.

Can I use AutoML in Microsoft Fabric?

Yes, Fabric includes automated machine learning capabilities that can automatically select algorithms, tune hyperparameters, and generate feature engineering suggestions.

What ML frameworks does Microsoft Fabric support?

Fabric supports scikit-learn, TensorFlow, PyTorch, XGBoost, LightGBM, and any Python-based ML framework through Spark notebooks. MLflow is integrated for experiment tracking, model registry, and deployment. You can also use Azure Machine Learning integration for enterprise-grade MLOps workflows.

How do I deploy a trained model in Fabric?

Register your trained model in the MLflow model registry within your Fabric workspace. From there, you can use the PREDICT function in Spark SQL or notebooks to score data against the model. For batch scoring, create a notebook or pipeline that loads the model and processes new data on a schedule.

Can business users consume ML predictions in Power BI?

Yes. ML model predictions stored as Delta tables in the Lakehouse are automatically accessible through Direct Lake mode in Power BI. Business users see prediction results as regular columns in their reports and dashboards without needing to understand the underlying ML pipeline.

How is a Power BI engagement priced for enterprise workloads?

Pricing is modeled against three variables: the complexity of the semantic layer, the volume and velocity of source data, and the governance footprint required after go-live. A scoped implementation typically runs as a fixed-fee discovery sprint followed by a time-and-materials build, because dataset refresh patterns and row-level security rules almost always evolve once real user personas review early drafts. Licensing is separated from consulting fees and mapped against Power BI Pro, Premium Per User, and Fabric capacity SKUs so finance teams can plan capacity uplift independently of delivery cost. A typical mid-market rollout lands between 120 and 480 consulting hours across data modeling, DAX optimization, report design, and deployment pipelines. Fabric workloads add capacity sizing sessions to avoid over-provisioning F-SKUs on day one.

How long does a production-ready Power BI rollout take?

A single subject-area workspace with a conformed star schema, deployment pipeline, and row-level security ships in roughly six to eight weeks once source access is granted. Multi-domain rollouts that span finance, operations, and customer analytics typically run three to five months because the semantic model has to reconcile calendar, product, and organizational hierarchies that rarely align across source systems. Fabric lakehouse projects add four to six weeks for medallion design, Direct Lake tuning, and OneLake shortcut setup. Timelines compress when an authoritative data dictionary already exists and lengthen when stakeholders are discovering definitions (for example, what counts as "active customer") for the first time. Governance, training, and center-of-excellence enablement are run in parallel rather than sequentially.

What delivery methodology do you use for Power BI and Fabric projects?

Delivery runs on a five-stage pattern: Discover, Model, Visualize, Operationalize, and Enable. Discover captures questions stakeholders actually want answered, not just source tables. Model builds a Kimball-style star schema (or a medallion lakehouse in Fabric) with conformed dimensions and documented grains. Visualize produces reports against a shared theme file, applies accessibility tokens, and uses bookmarks and field parameters instead of bespoke page duplication. Operationalize wires up deployment pipelines, dataset refresh monitoring, and Azure DevOps or GitHub source control via TMDL. Enable delivers hands-on training for citizen developers and a lightweight center-of-excellence charter. Each stage ends with a demo and a written acceptance checklist, so scope creep is visible before it becomes rework.

How do you secure sensitive data inside Power BI reports?

Security is layered rather than relying on a single control. Row-level security and object-level security are enforced inside the semantic model using DAX filter expressions driven by Entra ID group membership, so filters survive refresh and cannot be bypassed by report-level tricks. Sensitivity labels from Microsoft Purview are applied at dataset and report scope and follow exports into Excel and PDF. Workspace roles follow a least-privilege pattern (Admin, Member, Contributor, Viewer) and are granted to Entra ID groups, never individuals. Private endpoints and VNet data gateways keep on-premises sources off the public internet, and tenant-level settings restrict publish-to-web, external sharing, and guest access. Every production dataset is audited quarterly against a written RLS test plan.

Can existing Excel and Tableau reports be migrated into Power BI?

Yes, but a migration is an opportunity to redesign rather than a literal port. Excel financial models usually translate cleanly into a semantic model: named ranges become dimensions, pivot tables become matrix visuals, and SUMIFS chains become well-formed DAX measures with variables. Tableau workbooks require more interpretation because Tableau extracts and Power BI datasets have different refresh and relationship semantics; calculated fields are rewritten as DAX, LOD expressions become CALCULATE patterns, and dashboard actions are rebuilt using bookmarks and drillthrough. A discovery audit catalogs every report, ranks them by business value, and retires the long tail of duplicates before migration starts. This typically reduces the report estate by 30–50 percent.

How do you optimize DAX performance on large semantic models?

DAX performance starts with the model, not the measure. Star schemas with integer surrogate keys, single-direction relationships, and calculated columns materialized in Power Query outperform ambiguous snowflakes every time. Measures are written using variables (VAR/RETURN) to avoid repeated context transitions, and expensive patterns like FILTER over entire tables are replaced with KEEPFILTERS and Boolean filter arguments. Aggregation tables and composite models offload detail-level queries from imported caches to Direct Lake or DirectQuery sources. Every measure that appears in a production report is profiled with DAX Studio and VertiPaq Analyzer; queries above a 2-second threshold on representative hardware are rewritten before release. Fabric capacity metrics are reviewed weekly to catch runaway refreshes and interactive CPU spikes.

Do you support Microsoft Fabric, OneLake, and Direct Lake workloads?

Fabric is now the default landing zone for new analytics workloads unless a client has a specific reason to stay on legacy Power BI Premium capacity. Engagements cover medallion architecture design in lakehouses, data engineering pipelines built in Fabric notebooks or Dataflows Gen2, shortcut-based integration with Azure Data Lake Storage and Amazon S3 through OneLake, and Direct Lake semantic models that skip import refresh entirely. Capacity sizing is based on measured workload patterns rather than vendor rules of thumb, and F-SKU scaling is automated through Azure Logic Apps or the Fabric REST API so off-hours costs stay predictable. Copilot in Fabric is configured with tenant-level data boundaries and audit logging before it is enabled for end users.

What training and enablement do end users receive after go-live?

Training is tiered by persona. Report consumers get a 45-minute guided tour covering filters, bookmarks, subscriptions, and mobile access. Analysts get a three-session workshop on self-service semantic model extensions, composite models, and publishing to workspaces that follow the governance pattern. Data modelers receive a two-day immersion on star schema design, DAX fundamentals, and deployment pipeline usage. Every session is recorded, indexed, and paired with a sandbox workspace seeded with representative sample data. A written center-of-excellence charter defines who owns certified datasets, who can promote a report to production, and how to request enhancements. Office hours run for 30 days post-launch so questions do not pile up into a formal change request.

How are Power BI deployments managed across dev, test, and production?

Every production workspace is paired with matching development and test workspaces wired through Power BI deployment pipelines. Datasets are source-controlled as TMDL files in Azure DevOps or GitHub so pull requests can be reviewed by a second modeler before merge. Environment-specific parameters (connection strings, sensitivity label rules, capacity assignments) are swapped at deployment time using parameter rules rather than manual edits. Fabric deployment pipelines handle lakehouses, notebooks, and data pipelines with the same promotion pattern. Refresh schedules, gateway assignments, and alerting are applied through the Power BI REST API so they survive redeployment. A written change-management checklist covers dataset certification, dependency impact analysis, and rollback procedure for every promotion to production.

Which industries and data sources do you support most often?

Heaviest experience sits in healthcare, financial services, energy, manufacturing, and public sector, because each of those verticals pushes a different dimension of Power BI: HIPAA-bound PHI handling, transaction-grain reconciliation, time-series sensor data, cost-center-driven manufacturing variance, and FedRAMP-aligned government reporting. Source systems frequently include Microsoft Dynamics 365, SAP ECC and S/4HANA, Salesforce, Workday, Epic and Cerner (Oracle Health), Infor, and a long tail of legacy ODBC databases connected through on-premises data gateways. Fabric engagements add Azure Data Lake Storage, Snowflake, Databricks, BigQuery, and Amazon S3 through OneLake shortcuts. Regardless of source, the modeling discipline is identical: conformed dimensions, documented grains, and a semantic layer that hides the join complexity from report authors.

Building ML Models in Microsoft Fabric

Microsoft Fabric Data Science provides a complete ML lifecycle environment integrated directly into the analytics platform—from data exploration and feature engineering to model training, experiment tracking, deployment, and batch scoring. Unlike standalone ML platforms (SageMaker, Vertex AI, Databricks ML) that require separate data movement pipelines, Fabric Data Science operates directly on OneLake data, eliminating the traditional gap between data engineering and data science. Models trained in Fabric can score data in Lakehouses, power predictions in Power BI reports, and run as batch scoring jobs—all within the same capacity and governance framework. Our Microsoft Fabric consulting team helps organizations implement production ML workflows within the Fabric platform.

Fabric Data Science Architecture

Component	Purpose	Key Feature
Notebooks	Interactive model development	Python, PySpark, R; pre-installed ML libraries
Experiments (MLflow)	Track training runs	Parameters, metrics, artifacts, model comparison
Model Registry	Version and manage models	Stage management, lineage tracking, deployment
Batch Scoring	Score data at scale	Spark-based, Lakehouse input/output
PREDICT function	In-database scoring	SQL and Spark PREDICT() for real-time inference
SynapseML	Pre-built ML capabilities	AutoML, cognitive services, distributed training

All components share OneLake storage and Fabric capacity, so there is no data duplication between your data engineering Lakehouses and your ML training environments.

End-to-End ML Workflow

Phase 1: Data Exploration and Feature Engineering

Start in a Fabric notebook with direct access to Lakehouse tables:

Load data from OneLake: Read Delta tables directly using Spark DataFrames—no data copy or export required. The same Silver and Gold layer tables prepared by your data engineering team are immediately available for ML.

Exploratory Data Analysis (EDA): Use pandas, matplotlib, seaborn, and plotly (all pre-installed) for visualization and statistical analysis. Fabric notebooks render plots inline, making iterative EDA fast and visual.

Feature Engineering: Create training features by: - Aggregating transactional data (customer lifetime value from order history) - Encoding categorical variables (one-hot, target encoding) - Creating time-based features (days since last purchase, rolling averages) - Joining data from multiple Lakehouse tables using Spark SQL

Save features to a Feature Table: Write engineered features back to a dedicated Lakehouse table. This creates a reusable feature store that multiple experiments can reference—ensuring consistency between training and scoring.

Phase 2: Model Training

Fabric notebooks support all major ML frameworks with no additional installation:

Framework	Best For	Pre-installed
Scikit-learn	Traditional ML (classification, regression, clustering)	Yes
XGBoost / LightGBM	Gradient boosting (tabular data, Kaggle-winning algorithms)	Yes
PyTorch	Deep learning (NLP, computer vision, custom architectures)	Yes
TensorFlow/Keras	Deep learning (production deployment, TF Serving)	Yes
SynapseML	AutoML, pre-built cognitive services, distributed training	Yes
Prophet	Time series forecasting	Yes

For tabular business data (customer churn prediction, demand forecasting, lead scoring), scikit-learn and XGBoost/LightGBM deliver the best results with the simplest workflow. Deep learning frameworks are needed primarily for unstructured data (text classification, image recognition).

Phase 3: Experiment Tracking with MLflow

Fabric natively integrates MLflow for experiment management. Every training run should be tracked:

Autologging: Enable MLflow autologging at the start of your notebook. Fabric automatically logs all training parameters, performance metrics, and model artifacts for scikit-learn, XGBoost, LightGBM, PyTorch, and TensorFlow models without writing explicit logging code.

Experiment Comparison: The Fabric Experiments UI provides a visual comparison of all runs in an experiment—parameter values, metric charts, and artifact inspection side by side. Identify the best-performing model configuration quickly without building custom comparison code.

Key Metrics to Track: - Classification: Accuracy, Precision, Recall, F1, AUC-ROC, confusion matrix - Regression: RMSE, MAE, R-squared, residual distribution - Forecasting: MAPE, MASE, forecast vs actuals visualization - Training metadata: Training duration, data size, feature count, hyperparameters

Phase 4: Model Registration and Versioning

Once you identify the best model from your experiments, register it in the Fabric Model Registry:

From the experiment run details, click "Register Model"
Name the model descriptively (e.g., "customer-churn-classifier-v3")
Add a description documenting the model's purpose, training data, and performance
Set the model stage: None → Staging → Production → Archived

The registry tracks model lineage—linking each registered model back to the specific experiment run, training code, data version, and hyperparameters that produced it. This is essential for audit, reproducibility, and compliance.

Phase 5: Deployment and Scoring

Batch Scoring: The most common deployment pattern for business analytics. Create a notebook that loads the registered model, reads new data from a Lakehouse table, generates predictions, and writes results back to a Lakehouse table. Schedule this notebook to run daily, weekly, or after each data refresh.

PREDICT Function: Fabric supports a PREDICT() function usable in SQL and Spark notebooks. Register your model, then call PREDICT directly in SQL queries against Lakehouse tables—enabling prediction scoring without Python code.

Power BI Integration: Connect Power BI to the Lakehouse table containing prediction results. Build reports that show predicted customer churn risk alongside actual customer metrics, forecast demand alongside inventory levels, or scored leads alongside sales pipeline data.

AutoML with SynapseML

For organizations without dedicated data science teams, SynapseML provides automated machine learning:

Automated feature engineering: Detects feature types and applies appropriate transformations
Algorithm selection: Tests multiple algorithms (logistic regression, random forest, gradient boosting, neural networks) and selects the best performer
Hyperparameter tuning: Grid search and Bayesian optimization across the parameter space
Model explanation: Generates feature importance rankings and partial dependence plots

AutoML does not replace expert data science for complex problems, but it provides a strong baseline that can be deployed quickly while more sophisticated models are developed.

Best Practices

Version training data: Use Delta Lake time travel to pin the exact dataset version used for each experiment. This ensures reproducibility.
Separate feature engineering from training: Reusable feature tables enable multiple experiments without re-computing features each time
Enable autologging early: Track everything from the first experiment. You cannot retroactively log parameters from untracked runs.
Use the staging workflow: Never deploy directly to production. Stage models, validate against holdout data, compare with the current production model, then promote.
Monitor model drift: Schedule periodic comparisons of model predictions vs actual outcomes. When accuracy degrades beyond threshold, retrain.
Document business context: Register models with clear descriptions of what they predict, what actions the business should take on predictions, and known limitations.

Related Resources

ML Model Deployment Best Practices in Fabric

After deploying dozens of ML models in Fabric for enterprise clients, these are the patterns that consistently deliver production-grade results:

Feature stores in OneLake: Centralize engineered features as Delta tables so multiple models share consistent inputs. This eliminates the #1 cause of model drift — inconsistent feature computation between training and inference.
Model versioning with MLflow: Every model training run should log parameters, metrics, and artifacts to the built-in MLflow tracking server. When a model degrades, you need instant access to the last known good version.
**Automated retraining pipelines**: Schedule weekly or monthly retraining using Fabric notebooks triggered by Data Factory pipelines. Compare new model performance against the production baseline before promoting.
A/B testing framework: Deploy new models alongside existing ones and route 10% of traffic to the challenger. Only promote when the challenger demonstrates statistically significant improvement over 2+ weeks.
**Monitoring and alerting**: Use Data Activator to trigger alerts when prediction accuracy drops below thresholds or input data distribution shifts significantly.

The key insight from my experience: the model itself is 20% of the work. The other 80% is the infrastructure around it — data pipelines, monitoring, retraining, and governance. Fabric handles more of that 80% than any other platform I have used.

For help building production ML pipelines in Microsoft Fabric, contact our team.

Model Governance in Microsoft Fabric

For enterprise ML deployments, governance is not optional — it is a compliance requirement in regulated industries:

Model registry: Register every model in the MLflow model registry with version, training data hash, performance metrics, and owner. This creates an audit trail that satisfies SOC 2 and HIPAA requirements for AI systems.
Approval workflow: Require human review and sign-off before any model moves from staging to production. Implement this through Azure DevOps pull request approvals tied to Fabric deployment pipelines.
Bias monitoring: Schedule monthly bias audits that compare model predictions across protected classes (age, gender, race). Log results and remediation actions — regulators increasingly require this documentation.
Explainability requirements: For models that influence decisions about people (credit scoring, hiring, insurance), implement SHAP or LIME explanations and surface them in Power BI reports alongside predictions. This is legally required under GDPR Article 22 and increasingly expected under US state AI regulations.

For help building governed ML pipelines in Microsoft Fabric, contact our team for an AI governance assessment.

Building ML Models in Microsoft Fabric

Fabric Data Science Architecture

End-to-End ML Workflow

Phase 1: Data Exploration and Feature Engineering

Phase 2: Model Training

Phase 3: Experiment Tracking with MLflow

Phase 4: Model Registration and Versioning

Phase 5: Deployment and Scoring

AutoML with SynapseML

Best Practices

Related Resources

ML Model Deployment Best Practices in Fabric

Model Governance in Microsoft Fabric

Frequently Asked Questions

What ML frameworks does Fabric support?

Can I use AutoML in Microsoft Fabric?

What ML frameworks does Microsoft Fabric support?

How do I deploy a trained model in Fabric?

Can business users consume ML predictions in Power BI?

Related Articles

Building a Modern Data Lakehouse with Microsoft Fabric

Getting Started with Fabric Notebooks and PySpark

Optimizing Spark Jobs in Fabric

Related Services

Microsoft Fabric Consulting

Data Analytics

Architecture Consulting

Industry Solutions

Need Help With Power BI?

Ready to Transform Your Data Strategy?