Power BI Python and R Integration: Advanced Analytics Guide for 2026
Data Engineering
Data Engineering15 min read

Power BI Python and R Integration: Advanced Analytics Guide for 2026

Integrate Python and R with Power BI for advanced analytics—covering Python visuals, R visuals, Python scripts in Power Query, statistical analysis, machine learning models, supported packages, limitations, and Fabric notebooks as an enterprise alternative.

By EPC Group

<p>Power BI is a business intelligence platform. Python and R are data science languages. When integrated correctly, the combination unlocks analytical capabilities that neither tool provides alone—statistical testing, machine learning model scoring, custom visualizations beyond Power BI's native visual library, advanced time series forecasting, clustering, anomaly detection, and natural language processing. When integrated poorly, the combination creates fragile, slow, ungovernable reports that break in production and terrify IT administrators.</p>

<p>This guide covers every integration pattern between Python/R and Power BI, with honest assessments of what works, what does not, and where <a href="/blog/fabric-notebooks-pyspark">Fabric notebooks</a> provide a superior alternative. Our <a href="/services/power-bi-consulting">Power BI consulting practice</a> has implemented Python and R integrations for Fortune 500 organizations in healthcare, finance, and manufacturing—environments where statistical rigor, reproducibility, and governance are non-negotiable.</p>

<h2>Integration Architecture Overview</h2>

<p>Python and R integrate with Power BI through four distinct patterns, each with different execution contexts, capabilities, and limitations:</p>

<table> <thead><tr><th>Pattern</th><th>Execution Context</th><th>When It Runs</th><th>Use Case</th></tr></thead> <tbody> <tr><td>Python/R Visuals</td><td>Power BI Desktop (local) or Power BI Service (managed runtime)</td><td>Report render time (when user opens report or changes filter)</td><td>Custom statistical visualizations</td></tr> <tr><td>Python Scripts in Power Query</td><td>Power BI Desktop (local) or On-premises Gateway (for scheduled refresh)</td><td>Data refresh time</td><td>Data transformation, API calls, web scraping</td></tr> <tr><td>Azure ML Integration</td><td>Azure Machine Learning managed endpoints</td><td>Data refresh time (Dataflow invocation)</td><td>ML model scoring at scale</td></tr> <tr><td>Fabric Notebooks</td><td>Fabric Spark compute</td><td>Pipeline execution (scheduled or triggered)</td><td>Large-scale data science, feature engineering, model training</td></tr> </tbody> </table>

<h2>Python Visuals in Power BI</h2>

<p>Python visuals allow you to create matplotlib, seaborn, or plotly-generated visualizations directly within a Power BI report page. The visual receives a pandas DataFrame from the Power BI semantic model (filtered by the current report context—slicers, cross-filters, drill-throughs) and returns a static image rendered by your Python script.</p>

<h3>Setting Up the Python Environment</h3>

<p><strong>Power BI Desktop</strong>: Install Python locally (3.8+ recommended). In Power BI Desktop, go to File &gt; Options &gt; Python scripting &gt; set the detected Python home directory. If you use virtual environments (recommended), point to the virtual environment's Python executable.</p>

<p><strong>Required packages</strong> (install via pip):</p> <pre><code>pip install pandas matplotlib seaborn scipy scikit-learn statsmodels plotly </code></pre>

<p><strong>Power BI Service</strong>: The service uses a managed Python runtime with pre-installed packages. You cannot install custom packages in the service runtime. The pre-installed packages include: pandas, numpy, scipy, scikit-learn, matplotlib, seaborn, statsmodels, and several others. Check the <a href="https://learn.microsoft.com/en-us/power-bi/connect-data/service-python-packages-support">official Microsoft documentation</a> for the current package list.</p>

<h3>Creating a Python Visual</h3>

<p>In Power BI Desktop, add a Python visual from the Visualizations pane. Drag fields from your semantic model into the Values well. Power BI converts these fields into a pandas DataFrame called <code>dataset</code>. Write your Python script in the editor:</p>

<pre><code>import matplotlib.pyplot as plt import seaborn as sns import pandas as pd

# dataset is provided by Power BI (filtered by current context) fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Distribution plot sns.histplot(data=dataset, x='Revenue', hue='Region', multiple='stack', bins=30, ax=axes[0]) axes[0].set_title('Revenue Distribution by Region') axes[0].set_xlabel('Revenue ($)')

# Box plot with outlier detection sns.boxplot(data=dataset, x='Region', y='Revenue', ax=axes[1]) axes[1].set_title('Revenue Spread and Outliers by Region') axes[1].set_ylabel('Revenue ($)')

plt.tight_layout() plt.show() </code></pre>

<h3>Advanced Python Visual Patterns</h3>

<p><strong>Pattern 1: Statistical Hypothesis Testing</strong></p> <pre><code>import matplotlib.pyplot as plt import scipy.stats as stats import pandas as pd import numpy as np

# Compare two groups group_a = dataset[dataset['Segment'] == 'Enterprise']['Revenue'].dropna() group_b = dataset[dataset['Segment'] == 'SMB']['Revenue'].dropna()

# Welch's t-test (does not assume equal variances) t_stat, p_value = stats.ttest_ind(group_a, group_b, equal_var=False)

fig, ax = plt.subplots(figsize=(10, 6)) ax.hist(group_a, bins=25, alpha=0.5, label=f'Enterprise (n={len(group_a)}, mean={group_a.mean():,.0f})') ax.hist(group_b, bins=25, alpha=0.5, label=f'SMB (n={len(group_b)}, mean={group_b.mean():,.0f})') ax.set_title(f'Revenue Distribution: Enterprise vs SMB\nt={t_stat:.2f}, p={p_value:.4f} {"(Significant)" if p_value &lt; 0.05 else "(Not Significant)"}') ax.set_xlabel('Revenue ($)') ax.set_ylabel('Frequency') ax.legend() plt.tight_layout() plt.show() </code></pre>

<p><strong>Pattern 2: Time Series Decomposition</strong></p> <pre><code>import matplotlib.pyplot as plt from statsmodels.tsa.seasonal import seasonal_decompose import pandas as pd

# Prepare time series ts = dataset.set_index('Date')['Revenue'].sort_index() ts = ts.resample('M').sum() # Monthly aggregation

# Decompose into trend, seasonal, residual result = seasonal_decompose(ts, model='additive', period=12)

fig, axes = plt.subplots(4, 1, figsize=(12, 10), sharex=True) result.observed.plot(ax=axes[0], title='Observed') result.trend.plot(ax=axes[1], title='Trend') result.seasonal.plot(ax=axes[2], title='Seasonal') result.resid.plot(ax=axes[3], title='Residual') plt.tight_layout() plt.show() </code></pre>

<p><strong>Pattern 3: Clustering with Visualization</strong></p> <pre><code>import matplotlib.pyplot as plt from sklearn.cluster import KMeans from sklearn.preprocessing import StandardScaler import pandas as pd import numpy as np

features = dataset[['Revenue', 'Frequency', 'Recency']].dropna() scaler = StandardScaler() scaled = scaler.fit_transform(features)

# Determine optimal k using silhouette score from sklearn.metrics import silhouette_score scores = [] K_range = range(2, 8) for k in K_range: km = KMeans(n_clusters=k, random_state=42, n_init=10) labels = km.fit_predict(scaled) scores.append(silhouette_score(scaled, labels))

best_k = K_range[np.argmax(scores)] km_final = KMeans(n_clusters=best_k, random_state=42, n_init=10) features['Cluster'] = km_final.fit_predict(scaled)

fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Cluster scatter colors = plt.cm.Set1(np.linspace(0, 1, best_k)) for i in range(best_k): mask = features['Cluster'] == i axes[0].scatter(features.loc[mask, 'Revenue'], features.loc[mask, 'Frequency'], c=[colors[i]], label=f'Cluster {i} (n={mask.sum()})', alpha=0.6) axes[0].set_xlabel('Revenue') axes[0].set_ylabel('Purchase Frequency') axes[0].set_title(f'Customer Segments (k={best_k})') axes[0].legend()

# Silhouette scores axes[1].bar(K_range, scores) axes[1].set_xlabel('Number of Clusters') axes[1].set_ylabel('Silhouette Score') axes[1].set_title('Optimal Cluster Selection') axes[1].axvline(best_k, color='red', linestyle='--', label=f'Best k={best_k}') axes[1].legend()

plt.tight_layout() plt.show() </code></pre>

<h2>R Visuals in Power BI</h2>

<p>R visuals follow the same pattern as Python visuals—a DataFrame is passed to the R script, and the script outputs a visualization. R excels in statistical depth and has a mature ecosystem for specific analytical domains.</p>

<h3>Why Choose R Over Python in Power BI</h3>

<ul> <li><strong>ggplot2</strong>: The grammar-of-graphics implementation in R is unmatched for publication-quality statistical visualizations with fine-grained aesthetic control</li> <li><strong>Specialized statistical packages</strong>: R has deeper coverage for survival analysis (survival, survminer), Bayesian analysis (brms, rstanarm), clinical trials (rpact), and econometrics (plm, AER)</li> <li><strong>Forecasting</strong>: The forecast package (auto.arima, ETS) and prophet package provide robust time series forecasting with confidence intervals natively rendered in ggplot2</li> <li><strong>Bioinformatics and healthcare</strong>: Bioconductor packages for genomics, and specialized healthcare analytics packages make R the preferred choice for clinical data visualization</li> </ul>

<h3>R Visual Examples</h3>

<p><strong>Pattern 1: ggplot2 Advanced Statistical Plot</strong></p> <pre><code>library(ggplot2) library(ggthemes)

# dataset is provided by Power BI p &lt;- ggplot(dataset, aes(x = Revenue, y = Margin, color = Segment)) + geom_point(alpha = 0.6, size = 2) + geom_smooth(method = "lm", se = TRUE, linetype = "dashed") + facet_wrap(~Region, scales = "free") + scale_color_manual(values = c("#1f77b4", "#ff7f0e", "#2ca02c")) + labs(title = "Revenue vs Margin by Region and Segment", subtitle = "Linear trend with 95% confidence interval", x = "Revenue ($)", y = "Margin (%)", caption = "Source: Enterprise Data Warehouse") + theme_economist() + theme(legend.position = "bottom")

print(p) </code></pre>

<p><strong>Pattern 2: Survival Analysis (Healthcare / Insurance)</strong></p> <pre><code>library(survival) library(survminer)

# Fit Kaplan-Meier survival curve fit &lt;- survfit(Surv(Time, Event) ~ Treatment, data = dataset)

# Plot with risk table p &lt;- ggsurvplot(fit, data = dataset, risk.table = TRUE, pval = TRUE, conf.int = TRUE, palette = c("#1f77b4", "#d62728"), title = "Patient Survival by Treatment Group", xlab = "Time (months)", ylab = "Survival Probability", legend.title = "Treatment", risk.table.height = 0.25)

print(p) </code></pre>

<p><strong>Pattern 3: Forecasting with Prophet</strong></p> <pre><code>library(prophet) library(ggplot2) library(dplyr)

# Prepare data for Prophet df &lt;- dataset %&gt;% select(ds = Date, y = Revenue) %&gt;% arrange(ds)

# Fit model m &lt;- prophet(df, yearly.seasonality = TRUE, weekly.seasonality = FALSE)

# Forecast 12 months future &lt;- make_future_dataframe(m, periods = 365) forecast &lt;- predict(m, future)

# Plot p &lt;- plot(m, forecast) + labs(title = "Revenue Forecast (12-Month Horizon)", x = "Date", y = "Revenue ($)") + theme_minimal()

print(p) </code></pre>

<h2>Python Scripts in Power Query</h2>

<p>Beyond visuals, Python can execute during the data refresh phase inside Power Query. This is a fundamentally different integration pattern—it runs at ETL time, not render time—and enables transformations that M language cannot express.</p>

<h3>How It Works</h3>

<ol> <li>In Power Query Editor, add a step: Transform &gt; Run Python Script</li> <li>The current query result is passed to the script as a pandas DataFrame called <code>dataset</code></li> <li>The script can create new DataFrames. Each DataFrame defined in the script appears as a navigation table in Power Query</li> <li>Select the output DataFrame to continue the Power Query transformation chain</li> </ol>

<h3>Use Cases for Python in Power Query</h3>

<p><strong>1. API Data Ingestion</strong></p> <pre><code>import pandas as pd import requests

# Call a REST API that Power Query cannot easily handle response = requests.get( 'https://api.example.com/data', headers={'Authorization': 'Bearer YOUR_TOKEN'}, params={'start_date': '2026-01-01', 'end_date': '2026-02-23'} )

data = response.json() output = pd.json_normalize(data['results']) </code></pre>

<p><strong>2. Advanced Text Processing / NLP</strong></p> <pre><code>import pandas as pd from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.cluster import KMeans

# Cluster customer feedback into topic groups vectorizer = TfidfVectorizer(max_features=1000, stop_words='english') tfidf = vectorizer.fit_transform(dataset['Feedback'].fillna(''))

km = KMeans(n_clusters=5, random_state=42, n_init=10) dataset['Topic_Cluster'] = km.fit_predict(tfidf)

# Get top terms per cluster feature_names = vectorizer.get_feature_names_out() cluster_labels = {} for i in range(5): center = km.cluster_centers_[i] top_indices = center.argsort()[-5:][::-1] cluster_labels[i] = ', '.join([feature_names[j] for j in top_indices])

dataset['Topic_Label'] = dataset['Topic_Cluster'].map(cluster_labels) </code></pre>

<p><strong>3. Statistical Outlier Detection</strong></p> <pre><code>import pandas as pd import numpy as np from scipy import stats

# Z-score based outlier detection numeric_cols = dataset.select_dtypes(include=[np.number]).columns z_scores = np.abs(stats.zscore(dataset[numeric_cols].fillna(0)))

# Flag rows where any column exceeds z-score threshold dataset['Is_Outlier'] = (z_scores &gt; 3).any(axis=1) dataset['Max_Z_Score'] = z_scores.max(axis=1) dataset['Outlier_Column'] = numeric_cols[z_scores.argmax(axis=1)] </code></pre>

<p><strong>4. Web Scraping for Enrichment</strong></p> <pre><code>import pandas as pd import requests from bs4 import BeautifulSoup

# Enrich company data with website metadata def get_company_info(url): try: resp = requests.get(url, timeout=10) soup = BeautifulSoup(resp.text, 'html.parser') title = soup.title.string if soup.title else '' description = '' meta_desc = soup.find('meta', attrs={'name': 'description'}) if meta_desc: description = meta_desc.get('content', '') return pd.Series([title, description]) except: return pd.Series(['', ''])

dataset[['Site_Title', 'Site_Description']] = dataset['Website'].apply(get_company_info) </code></pre>

<h3>Python in Power Query: Critical Limitations</h3>

<ul> <li><strong>Gateway requirement</strong>: For scheduled refresh in the Power BI service, Python scripts in Power Query require the on-premises data gateway with Python installed on the gateway machine. The Power BI service cannot execute Python scripts natively during refresh</li> <li><strong>No query folding</strong>: Python script steps break <a href="/blog/query-folding-power-query-troubleshooting-guide-2026">query folding</a> completely. All data must be loaded into memory before the Python step executes</li> <li><strong>Package management</strong>: The gateway machine must have all required packages installed. Use a requirements.txt and virtual environment to ensure reproducibility</li> <li><strong>Timeout</strong>: Scripts have a 30-minute timeout during scheduled refresh. Long-running ML training or large API calls may fail</li> <li><strong>Security</strong>: Python scripts execute arbitrary code on the gateway machine. Review all scripts for security implications before deploying to production. Restrict who can publish datasets with Python scripts</li> <li><strong>Debugging</strong>: Error messages from failed Python scripts in scheduled refresh are often cryptic. Test scripts thoroughly in Power BI Desktop before publishing</li> </ul>

<h2>Supported Packages and Version Considerations</h2>

<h3>Power BI Service Python Runtime</h3>

<p>The Power BI service provides a managed Python runtime for Python visuals (not for Power Query scripts—those require the gateway). The runtime includes a curated set of pre-installed packages. Key packages available:</p>

<table> <thead><tr><th>Category</th><th>Packages</th></tr></thead> <tbody> <tr><td>Data manipulation</td><td>pandas, numpy</td></tr> <tr><td>Visualization</td><td>matplotlib, seaborn, plotly (static export)</td></tr> <tr><td>Statistics</td><td>scipy, statsmodels</td></tr> <tr><td>Machine learning</td><td>scikit-learn</td></tr> <tr><td>NLP</td><td>nltk (limited)</td></tr> <tr><td>Date/time</td><td>dateutil</td></tr> </tbody> </table>

<p><strong>Packages NOT available in the service runtime</strong> (common requests that fail):</p> <ul> <li>tensorflow, pytorch, keras (deep learning frameworks—too large for the runtime)</li> <li>xgboost, lightgbm, catboost (gradient boosting—not pre-installed)</li> <li>transformers (Hugging Face—not pre-installed)</li> <li>geopandas (geospatial analysis—dependency issues)</li> <li>Any package requiring C compilation during install</li> </ul>

<p>If you need these packages, use <a href="/blog/fabric-notebooks-pyspark">Fabric notebooks</a> where you have full control over the Python environment.</p>

<h3>Power BI Service R Runtime</h3>

<p>The R runtime in Power BI service includes:</p> <ul> <li>ggplot2, tidyverse (dplyr, tidyr, stringr, purrr, forcats)</li> <li>forecast, prophet</li> <li>survival, survminer</li> <li>caret, randomForest, e1071</li> <li>leaflet (static map export), plotly (static export)</li> <li>lubridate, zoo, xts</li> </ul>

<h2>Limitations Across All Integration Patterns</h2>

<p>Before committing to Python/R integration in Power BI, understand the limitations that affect all patterns:</p>

<h3>Python/R Visuals Limitations</h3>

<ol> <li><strong>150,000 row limit</strong>: Power BI samples the dataset to 150,000 rows before passing to the script. If your dataset exceeds this threshold, results are based on a sample, not the full dataset. This is particularly problematic for clustering, outlier detection, and distribution analysis where sampling introduces bias</li> <li><strong>30-second execution timeout</strong>: Scripts in the Power BI service must complete within 30 seconds. Complex ML models, large data processing, or slow API calls will timeout</li> <li><strong>Static image output</strong>: Python/R visuals render as static PNG images. No hover tooltips, no click interactions, no cross-filtering. Users see a picture, not an interactive visual</li> <li><strong>No Power BI Embedded support</strong>: Python/R visuals cannot be rendered in Power BI Embedded (the embedding technology for external-facing applications). If you plan to embed reports for customers, do not use Python/R visuals</li> <li><strong>No paginated reports</strong>: Python/R visuals are not supported in paginated reports</li> <li><strong>Render on every interaction</strong>: The script re-executes every time the report context changes (slicer change, cross-filter, page navigation). This means ML models retrain on every filter change—computationally wasteful and potentially confusing if results change with each interaction</li> <li><strong>No incremental execution</strong>: There is no caching or incremental execution. Every render is a full script execution from scratch</li> </ol>

<h3>Python in Power Query Limitations</h3>

<ol> <li><strong>Gateway dependency</strong>: Scheduled refresh requires Python on the on-premises data gateway. Cloud-only architectures cannot use Python in Power Query for scheduled refresh</li> <li><strong>Breaks query folding</strong>: Any step before or containing Python breaks folding for all subsequent steps</li> <li><strong>Environment management</strong>: Package versions must be managed on the gateway machine. Version conflicts between datasets sharing the same gateway can cause failures</li> <li><strong>No parallelism</strong>: Python scripts execute sequentially during refresh. Multiple tables with Python steps process one at a time</li> <li><strong>Memory constraints</strong>: The entire dataset must fit in memory on the gateway machine before the Python step executes. For large datasets, this can exhaust gateway RAM</li> </ol>

<h2>Fabric Notebooks: The Enterprise Alternative</h2>

<p>For organizations hitting the limitations of Python/R integration in Power BI, <a href="/blog/fabric-notebooks-pyspark">Microsoft Fabric notebooks</a> provide a scalable, governed alternative that addresses every limitation listed above.</p>

<h3>Why Fabric Notebooks Excel</h3>

<table> <thead><tr><th>Limitation</th><th>Python/R in Power BI</th><th>Fabric Notebooks</th></tr></thead> <tbody> <tr><td>Data volume</td><td>150K rows (visuals), memory-limited (Power Query)</td><td>Billions of rows via Spark distributed compute</td></tr> <tr><td>Execution time</td><td>30 seconds (visuals), 30 minutes (Power Query)</td><td>Hours if needed, with checkpoint/restart</td></tr> <tr><td>Package control</td><td>Pre-installed only (service), manual install (gateway)</td><td>Full pip/conda environment control per notebook</td></tr> <tr><td>Output</td><td>Static image (visuals), DataFrame (Power Query)</td><td>Delta tables in OneLake, directly consumed by Direct Lake models</td></tr> <tr><td>Governance</td><td>Scripts embedded in .pbix files, no version control</td><td>Git integration, CI/CD, code review workflows</td></tr> <tr><td>Compute isolation</td><td>Shared mashup engine or gateway</td><td>Dedicated Spark sessions with configurable compute</td></tr> <tr><td>Scheduling</td><td>Tied to dataset refresh</td><td>Independent pipeline scheduling with dependencies</td></tr> </tbody> </table>

<h3>Recommended Architecture: Fabric Notebooks + Power BI</h3>

<p>The optimal architecture separates data science computation from Power BI visualization:</p>

<ol> <li><strong>Feature engineering</strong>: Fabric notebooks read raw data from Lakehouse tables, engineer features, and write feature tables back to the Lakehouse</li> <li><strong>Model training</strong>: Fabric notebooks train ML models using the full Spark compute with any packages (scikit-learn, XGBoost, PyTorch, transformers). Models are registered in MLflow for versioning</li> <li><strong>Model scoring</strong>: Scheduled Fabric notebooks score new data using the registered model and write predictions to a scored output table in the Lakehouse</li> <li><strong>Power BI consumption</strong>: A <a href="/blog/power-bi-direct-lake-mode-guide-2026">Direct Lake</a> semantic model reads the scored output table from OneLake. Power BI reports visualize predictions, feature importance, model performance, and business metrics using native Power BI visuals—no Python/R visuals needed</li> </ol>

<p>This architecture provides:</p> <ul> <li>Full-scale data science computation without Power BI's 150K row limit</li> <li>Any Python/R package without service runtime restrictions</li> <li>Interactive Power BI visuals (not static images) for the scored output</li> <li>Git-integrated, version-controlled, code-reviewed data science code</li> <li>Separation of compute (Spark for data science, Power BI capacity for visualization)</li> <li>Governance through Fabric workspace security, lineage, and Purview integration</li> </ul>

<h2>Data Science Workflows in Power BI</h2>

<h3>Workflow 1: Anomaly Detection Pipeline</h3>

<p>Detect anomalies in business metrics (revenue spikes/drops, unusual transaction patterns, sensor readings outside normal ranges):</p>

<ol> <li><strong>Feature table</strong> (Fabric notebook): Calculate rolling statistics (7-day mean, standard deviation, z-score) for each metric. Write to Lakehouse</li> <li><strong>Model scoring</strong> (Fabric notebook): Apply Isolation Forest or DBSCAN to identify anomalous data points. Write anomaly scores and flags to Lakehouse</li> <li><strong>Power BI dashboard</strong>: Display time series with anomalous points highlighted (conditional formatting on anomaly flag column). Enable drill-through from anomalous points to detailed feature analysis</li> </ol>

<h3>Workflow 2: Customer Churn Prediction</h3>

<ol> <li><strong>Feature engineering</strong> (Fabric notebook): Calculate customer behavioral features—login frequency, support ticket count, usage trend, payment history, tenure, product count</li> <li><strong>Model training</strong> (Fabric notebook): Train a gradient boosting classifier (XGBoost) with SHAP explainability. Register model in MLflow</li> <li><strong>Batch scoring</strong> (Fabric notebook, daily scheduled): Score all active customers. Write churn probability, risk tier (high/medium/low), and top 3 churn drivers per customer to Lakehouse</li> <li><strong>Power BI report</strong>: Churn risk dashboard showing at-risk customer count by tier, predicted churn rate trend, SHAP-based driver analysis, and an actionable list of high-risk customers for the retention team. Use <a href="/blog/power-bi-drillthrough">drillthrough</a> from the summary to individual customer churn profiles</li> </ol>

<h3>Workflow 3: Demand Forecasting</h3>

<ol> <li><strong>Historical data preparation</strong> (Fabric notebook): Aggregate transactional data to the appropriate granularity (daily/weekly/monthly by product/location). Handle missing dates, holidays, and promotions as features</li> <li><strong>Model training</strong> (Fabric notebook): Train Prophet, ARIMA, or neural network models per product-location combination. Evaluate using backtesting (train on historical, predict known periods, measure accuracy)</li> <li><strong>Forecast generation</strong> (Fabric notebook, weekly scheduled): Generate 12-week forward forecasts with prediction intervals. Write forecasts to Lakehouse</li> <li><strong>Power BI report</strong>: Interactive forecast dashboard with actual vs. predicted overlay, forecast accuracy metrics (MAPE, RMSE), and confidence intervals rendered as area charts. Slicers for product, location, and time horizon. See our <a href="/blog/power-bi-what-if">what-if parameters guide</a> for scenario adjustment</li> </ol>

<h2>Security and Governance Considerations</h2>

<p>Python and R integration introduces code execution into your analytics platform. This requires security controls beyond standard Power BI governance.</p>

<h3>Script Review and Approval</h3> <ul> <li>Establish a code review process for all Python/R scripts before deployment to production. Scripts can execute arbitrary code—including network calls, file system access, and system commands</li> <li>Use Fabric Git integration for version control and pull request workflows for notebook code. See our <a href="/blog/fabric-git-integration">Git integration guide</a></li> <li>Maintain an approved package list. Packages not on the list require security review before installation on gateway machines</li> </ul>

<h3>Gateway Security for Python Scripts</h3> <ul> <li>Isolate the Python/R execution environment on the gateway machine using virtual environments or containers</li> <li>Run the gateway service under a restricted service account with minimal file system and network access</li> <li>Monitor gateway machine resource usage (CPU, memory, network) during Python script execution for unusual behavior</li> <li>Audit all Python scripts deployed via Power Query by maintaining a registry of datasets with script steps and their code content</li> </ul>

<h3>Compliance Implications</h3> <p>For regulated industries (<a href="/blog/power-bi-healthcare-hipaa-compliant-analytics-2026">healthcare/HIPAA</a>, <a href="/blog/power-bi-financial-services-regulatory-reporting-2026">financial services</a>, <a href="/blog/power-bi-government-fedramp-analytics-2026">government/FedRAMP</a>), Python/R scripts that process sensitive data must be documented, reviewed, and auditable. Fabric notebooks with Git integration provide better compliance posture than scripts embedded in .pbix files, which have no version history, no code review trail, and no separation between code and data.</p>

<h2>Decision Framework: When to Use What</h2>

<table> <thead><tr><th>Scenario</th><th>Recommended Approach</th><th>Why</th></tr></thead> <tbody> <tr><td>Custom statistical visualization (small data)</td><td>Python/R Visual</td><td>Quick implementation, interactive report context filtering</td></tr> <tr><td>One-off data transformation</td><td>Python in Power Query</td><td>Simplest implementation for ad-hoc ETL</td></tr> <tr><td>ML model scoring (any scale)</td><td>Fabric Notebook + Direct Lake</td><td>No row limits, full package control, governed</td></tr> <tr><td>Real-time model scoring</td><td>Azure ML Endpoint + Power Automate</td><td>Event-driven architecture, managed scaling</td></tr> <tr><td>Production forecasting/prediction</td><td>Fabric Notebook + Direct Lake</td><td>Reproducible, version-controlled, scalable</td></tr> <tr><td>API data ingestion (scheduled)</td><td>Fabric Data Pipeline or Notebook</td><td>Better error handling, retry logic, monitoring</td></tr> <tr><td>Customer-facing embedded analytics</td><td>Native Power BI visuals only</td><td>Python/R visuals not supported in Embedded</td></tr> <tr><td>Healthcare/finance regulated environment</td><td>Fabric Notebook (Git-integrated)</td><td>Audit trail, code review, compliance documentation</td></tr> </tbody> </table>

<p>Python and R integration transforms Power BI from a visualization tool into an advanced analytics platform. The key is choosing the right integration pattern for each use case—and increasingly, that pattern is Fabric notebooks for computation with Power BI for visualization. This separation of concerns provides the best combination of analytical power, performance, governance, and user experience. <a href="/contact">Contact EPC Group</a> for an advanced analytics architecture assessment that identifies where Python/R integration delivers value in your environment and designs an implementation that is performant, governed, and scalable.</p>

Frequently Asked Questions

What Python packages are available in the Power BI service for Python visuals?

The Power BI service provides a managed Python runtime with a curated set of pre-installed packages. Core packages include pandas, numpy, matplotlib, seaborn, scipy, scikit-learn, statsmodels, and plotly (static export only). The service does NOT include deep learning frameworks (TensorFlow, PyTorch, Keras), gradient boosting libraries (XGBoost, LightGBM, CatBoost), Hugging Face transformers, geopandas, or any package requiring C compilation during installation. You cannot install additional packages in the service runtime. If your analysis requires packages beyond the pre-installed set, use Fabric notebooks where you have full control over the Python environment via pip or conda. Check the Microsoft documentation for the current complete package list, as it is updated periodically.

Why do Python visuals in Power BI show different results than my local Python script?

There are three common causes. First, the 150,000 row limit: if your dataset exceeds 150,000 rows, Power BI samples the data before passing it to the Python script. Your local script processes the full dataset, producing different statistical results. Check your data volume and add a pre-aggregation step in Power Query if needed. Second, package version differences: the Power BI service runtime may use different package versions than your local installation. A scikit-learn model trained on version 1.3 locally may produce different results on version 1.2 in the service. Pin versions in your local environment to match the service. Third, report context filtering: Python visuals receive data filtered by the current slicer and cross-filter context. If slicers are active, the visual receives a subset of data. Test with all slicers cleared to compare against your full-dataset local results.

Should I use Python in Power Query or Fabric notebooks for data transformation?

Use Fabric notebooks for virtually all production data transformation scenarios. Python in Power Query has three critical limitations that make it unsuitable for enterprise use: it requires the on-premises data gateway (cannot run in cloud-only architectures), it breaks query folding completely (all upstream data must load into memory before the script executes), and it has no version control (scripts are embedded in the .pbix file with no Git integration or code review). Fabric notebooks solve all three: they run on Fabric Spark compute (no gateway), process data at any scale via distributed Spark, and integrate with Git for version control, pull requests, and audit trails. The only scenario where Python in Power Query is justified is quick prototyping or ad-hoc analysis in Power BI Desktop that will never be deployed to production scheduled refresh.

Can I use Python or R visuals in Power BI Embedded for external-facing applications?

No. Python and R visuals are not supported in Power BI Embedded (the technology used to embed Power BI reports in custom applications for external users). If you attempt to embed a report containing Python/R visuals, those visuals will display a placeholder or error message. For external-facing applications, you must use native Power BI visuals, custom visuals from AppSource, or build custom visuals using the Power BI Visuals SDK. If your analysis requires Python/R computation (such as ML predictions or statistical analysis), perform the computation in a Fabric notebook, write the results to a Lakehouse table, and build the Power BI report using native visuals on the pre-computed results. This approach works with Power BI Embedded and provides better performance since the computation happens during data refresh, not during report rendering.

How do Fabric notebooks compare to Python/R visuals for machine learning in Power BI?

Fabric notebooks are superior for ML in every enterprise dimension. Scale: notebooks process billions of rows via Spark distributed compute; Python visuals are limited to 150,000 rows. Packages: notebooks support any pip/conda package including XGBoost, PyTorch, transformers; the Power BI service runtime has a fixed, limited package set. Performance: notebooks execute during a scheduled pipeline run and write pre-computed results to OneLake; Python visuals re-execute the ML model on every report interaction (slicer change, filter, page navigation), which is wasteful and slow. Governance: notebooks integrate with Git for version control, code review, and audit trails; Python visual scripts are embedded in .pbix files with no version history. Output: notebook results are consumed via Direct Lake as interactive native Power BI visuals with full tooltips, cross-filtering, and drillthrough; Python visual output is a static PNG image with no interactivity. The recommended pattern is: train and score in Fabric notebooks, visualize in Power BI native visuals, and reserve Python/R visuals only for one-off statistical plots that do not need interactivity or production governance.

Python Power BIR Power BIAdvanced AnalyticsMachine LearningData ScienceStatistical AnalysisPython VisualsR VisualsPower Query PythonFabric NotebooksData EngineeringPredictive AnalyticsEnterprise Analytics

Industry Solutions

See how we apply these solutions across industries:

Need Help With Power BI?

Our experts can help you implement the solutions discussed in this article.

Ready to Transform Your Data Strategy?

Get a free consultation to discuss how Power BI and Microsoft Fabric can drive insights and growth for your organization.