Google Antigravity for Python Developers: Data Science, ML & Jupyter Workflows | Antigravity AI Directory | Google Antigravity Directory
Google Antigravity for Python Developers: Data Sci... Tutorials Google Antigravity for Python Developers: Data Science, ML & Jupyter Workflows Google Antigravity for Python Developers: Data Science, ML & Jupyter Workflows
Google Antigravity isn't just for web developers. Python programmers and data scientists can leverage Gemini 3 for powerful AI-assisted workflows. This guide covers everything from environment setup to advanced ML pipelines.
Setting Up Python in Antigravity
Python Extension Installation
Open Extensions panel (Cmd+Shift+X)
Install these essential extensions:
ms-python.python
ms-python.vscode-pylance
ms-python.black-formatter
charliermarsh.ruff
ms-toolsai.jupyter
Configure Python Settings
// .antigravity/settings.json
{
"python.defaultInterpreterPath": ".venv/bin/python",
"python.analysis.typeCheckingMode": "basic",
"python.analysis.autoImportCompletions": true,
"[python]": {
"editor.defaultFormatter": "ms-python.black-formatter",
"editor.formatOnSave": true,
"editor.codeActionsOnSave": {
"source.organizeImports": true
}
},
"ruff.enable": true,
"ruff.organizeImports": true
}
Virtual Environment Setup
# Create virtual environment
python -m venv .venv
# Activate (macOS/Linux)
source .venv/bin/activate
# Activate (Windows)
.venv\\Scripts\\activate
# Install data science stack
pip install pandas numpy matplotlib seaborn scikit-learn jupyter
Jupyter Notebook Integration
Native Jupyter Support
Antigravity has built-in Jupyter support. Create a .ipynb file and start coding.
AI-Assisted Notebook Development
You: "Create a Jupyter notebook for exploratory data analysis of a CSV file"
Gemini 3 generates:
# Cell 1: Imports and Setup
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
# Configure display
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
plt.style.use('seaborn-v0_8-whitegrid')
%matplotlib inline
Related Posts Refactoring Legacy Code with Google Antigravity: AI-Powered Modernization Strategies Transform legacy codebases using Google Antigravity and Gemini 3. Learn strategies for modernizing old code, migrating frameworks, updating patterns, and safely refactoring with AI assistance.
Google Antigravity Workflow Automation: Custom Tasks, Hooks & AI-Powered Pipelines Automate your development workflow in Google Antigravity. Learn to create custom tasks, build automation hooks, configure AI pipelines, and boost productivity with intelligent automation.
Google Antigravity Artifacts Explained: Understanding AI-Generated Code Previews Deep dive into Google Antigravity Artifacts - the AI-generated code preview system. Learn how artifacts work, when to use them, customization options, and best practices for effective AI collaboration.
# Cell 2: Load Data
def load_data(filepath: str) -> pd.DataFrame:
"""Load CSV file with automatic type inference."""
df = pd.read_csv(filepath, parse_dates=True, infer_datetime_format=True)
print(f"Loaded {len(df):,} rows and {len(df.columns)} columns")
return df
# Load your data
df = load_data("data/dataset.csv")
df.head()
# Cell 3: Data Overview
def data_overview(df: pd.DataFrame) -> None:
"""Print comprehensive data overview."""
print("=" * 50)
print("DATASET OVERVIEW")
print("=" * 50)
print(f"\nShape: {df.shape[0]:,} rows × {df.shape[1]} columns")
print("\nColumn Types:")
print(df.dtypes.value_counts())
print("\nMissing Values:")
missing = df.isnull().sum()
missing = missing[missing > 0].sort_values(ascending=False)
if len(missing) > 0:
print(missing)
else:
print("No missing values!")
print("\nNumeric Summary:")
print(df.describe())
data_overview(df)
# Cell 4: Visualizations
def create_distribution_plots(df: pd.DataFrame, numeric_cols: list = None):
"""Create distribution plots for numeric columns."""
if numeric_cols is None:
numeric_cols = df.select_dtypes(include=[np.number]).columns.tolist()
n_cols = min(3, len(numeric_cols))
n_rows = (len(numeric_cols) + n_cols - 1) // n_cols
fig, axes = plt.subplots(n_rows, n_cols, figsize=(5*n_cols, 4*n_rows))
axes = axes.flatten() if n_rows * n_cols > 1 else [axes]
for i, col in enumerate(numeric_cols):
sns.histplot(df[col], kde=True, ax=axes[i])
axes[i].set_title(f'Distribution of {col}')
# Hide empty subplots
for j in range(i + 1, len(axes)):
axes[j].set_visible(False)
plt.tight_layout()
plt.show()
create_distribution_plots(df)
Running Notebooks with Gemini 3
Execute cells with AI context - Gemini understands previous cells
Debug errors inline - Ask "Why is this cell failing?"
Generate visualizations - "Add a correlation heatmap for numeric columns"
Pandas with AI Assistance
Data Manipulation You: "Clean this dataframe: remove duplicates, fill missing values with median for numeric columns, and standardize column names"
def clean_dataframe(df: pd.DataFrame) -> pd.DataFrame:
"""
Comprehensive dataframe cleaning.
- Removes duplicates
- Fills missing numeric values with median
- Standardizes column names to snake_case
"""
# Create a copy
df_clean = df.copy()
# Standardize column names
df_clean.columns = (
df_clean.columns
.str.lower()
.str.replace(' ', '_')
.str.replace('[^a-z0-9_]', '', regex=True)
)
# Remove duplicates
initial_rows = len(df_clean)
df_clean = df_clean.drop_duplicates()
print(f"Removed {initial_rows - len(df_clean):,} duplicate rows")
# Fill missing numeric values with median
numeric_cols = df_clean.select_dtypes(include=[np.number]).columns
for col in numeric_cols:
missing_count = df_clean[col].isnull().sum()
if missing_count > 0:
median_val = df_clean[col].median()
df_clean[col] = df_clean[col].fillna(median_val)
print(f"Filled {missing_count} missing values in '{col}' with median ({median_val:.2f})")
return df_clean
df_clean = clean_dataframe(df)
Complex Transformations You: "Create a feature engineering pipeline for this customer dataset"
from sklearn.preprocessing import StandardScaler, LabelEncoder
from datetime import datetime
def engineer_features(df: pd.DataFrame) -> pd.DataFrame:
"""
Feature engineering for customer dataset.
"""
df_eng = df.copy()
# Date-based features
if 'signup_date' in df_eng.columns:
df_eng['signup_date'] = pd.to_datetime(df_eng['signup_date'])
df_eng['days_since_signup'] = (datetime.now() - df_eng['signup_date']).dt.days
df_eng['signup_month'] = df_eng['signup_date'].dt.month
df_eng['signup_dayofweek'] = df_eng['signup_date'].dt.dayofweek
# Aggregate features
if 'transaction_amount' in df_eng.columns:
customer_stats = df_eng.groupby('customer_id').agg({
'transaction_amount': ['sum', 'mean', 'count', 'std']
}).reset_index()
customer_stats.columns = ['customer_id', 'total_spend', 'avg_spend',
'transaction_count', 'spend_std']
df_eng = df_eng.merge(customer_stats, on='customer_id', how='left')
# Recency features
if 'last_purchase_date' in df_eng.columns:
df_eng['last_purchase_date'] = pd.to_datetime(df_eng['last_purchase_date'])
df_eng['days_since_purchase'] = (datetime.now() - df_eng['last_purchase_date']).dt.days
df_eng['is_active'] = (df_eng['days_since_purchase'] < 30).astype(int)
# Binning continuous variables
if 'age' in df_eng.columns:
df_eng['age_group'] = pd.cut(
df_eng['age'],
bins=[0, 25, 35, 50, 65, 100],
labels=['18-25', '26-35', '36-50', '51-65', '65+']
)
return df_eng
df_features = engineer_features(df_clean)
print(f"Created {len(df_features.columns) - len(df_clean.columns)} new features")
Machine Learning Workflows
Building ML Pipelines You: "Create a complete ML pipeline for predicting customer churn"
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score
import joblib
class ChurnPredictor:
"""End-to-end churn prediction pipeline."""
def __init__(self):
self.numeric_features = ['age', 'total_spend', 'days_since_purchase',
'transaction_count']
self.categorical_features = ['region', 'subscription_type']
self.pipeline = None
self.best_model = None
def build_pipeline(self, model):
"""Build preprocessing + model pipeline."""
numeric_transformer = StandardScaler()
categorical_transformer = OneHotEncoder(handle_unknown='ignore')
preprocessor = ColumnTransformer(
transformers=[
('num', numeric_transformer, self.numeric_features),
('cat', categorical_transformer, self.categorical_features)
]
)
return Pipeline([
('preprocessor', preprocessor),
('classifier', model)
])
def train(self, X: pd.DataFrame, y: pd.Series):
"""Train and compare multiple models."""
models = {
'Logistic Regression': LogisticRegression(max_iter=1000),
'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42),
'Gradient Boosting': GradientBoostingClassifier(random_state=42)
}
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
results = {}
for name, model in models.items():
pipeline = self.build_pipeline(model)
# Cross-validation
cv_scores = cross_val_score(pipeline, X_train, y_train, cv=5, scoring='roc_auc')
# Fit and evaluate
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)
y_prob = pipeline.predict_proba(X_test)[:, 1]
results[name] = {
'cv_score': cv_scores.mean(),
'cv_std': cv_scores.std(),
'test_auc': roc_auc_score(y_test, y_prob),
'pipeline': pipeline
}
print(f"\n{name}:")
print(f" CV AUC: {cv_scores.mean():.4f} (+/- {cv_scores.std()*2:.4f})")
print(f" Test AUC: {roc_auc_score(y_test, y_prob):.4f}")
# Select best model
best_name = max(results, key=lambda x: results[x]['test_auc'])
self.pipeline = results[best_name]['pipeline']
self.best_model = best_name
print(f"\nBest Model: {best_name}")
print("\nClassification Report:")
print(classification_report(y_test, self.pipeline.predict(X_test)))
return results
def predict(self, X: pd.DataFrame) -> np.ndarray:
"""Predict churn probability."""
return self.pipeline.predict_proba(X)[:, 1]
def save(self, path: str):
"""Save trained pipeline."""
joblib.dump(self.pipeline, path)
print(f"Model saved to {path}")
def load(self, path: str):
"""Load trained pipeline."""
self.pipeline = joblib.load(path)
print(f"Model loaded from {path}")
# Usage
predictor = ChurnPredictor()
# Prepare features
X = df_features[predictor.numeric_features + predictor.categorical_features]
y = df_features['churned']
# Train and compare models
results = predictor.train(X, y)
# Save best model
predictor.save('models/churn_predictor.joblib')
Model Evaluation Visualizations You: "Add visualization for model evaluation"
from sklearn.metrics import roc_curve, precision_recall_curve
def plot_model_evaluation(y_true, y_prob, model_name: str):
"""Create comprehensive model evaluation plots."""
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
# ROC Curve
fpr, tpr, _ = roc_curve(y_true, y_prob)
auc = roc_auc_score(y_true, y_prob)
axes[0].plot(fpr, tpr, label=f'AUC = {auc:.3f}')
axes[0].plot([0, 1], [0, 1], 'k--')
axes[0].set_xlabel('False Positive Rate')
axes[0].set_ylabel('True Positive Rate')
axes[0].set_title('ROC Curve')
axes[0].legend()
# Precision-Recall Curve
precision, recall, _ = precision_recall_curve(y_true, y_prob)
axes[1].plot(recall, precision)
axes[1].set_xlabel('Recall')
axes[1].set_ylabel('Precision')
axes[1].set_title('Precision-Recall Curve')
# Probability Distribution
y_pred = (y_prob > 0.5).astype(int)
axes[2].hist(y_prob[y_true == 0], bins=50, alpha=0.5, label='No Churn')
axes[2].hist(y_prob[y_true == 1], bins=50, alpha=0.5, label='Churn')
axes[2].axvline(x=0.5, color='r', linestyle='--', label='Threshold')
axes[2].set_xlabel('Predicted Probability')
axes[2].set_ylabel('Count')
axes[2].set_title('Probability Distribution')
axes[2].legend()
plt.suptitle(f'Model Evaluation: {model_name}')
plt.tight_layout()
plt.show()
# Generate evaluation plots
y_prob = predictor.predict(X)
plot_model_evaluation(y, y_prob, predictor.best_model)
GEMINI.md for Python Projects # Python Project Configuration
## Environment
- Python 3.11+
- Virtual environment: .venv
- Package manager: pip
## Code Style
- Formatter: Black (line length 88)
- Linter: Ruff
- Type hints required for functions
## Data Science Conventions
- Use pandas for tabular data
- NumPy for numerical operations
- scikit-learn for ML pipelines
- Matplotlib/Seaborn for visualization
## Jupyter Notebooks
- Clear outputs before committing
- Use descriptive markdown headers
- Include docstrings in functions
## ML Best Practices
- Always use pipelines
- Cross-validate models
- Log experiments with MLflow
- Version data with DVC
MCP Servers for Python Enhance Python workflows with MCP integrations:
Jupyter MCP Server {
"mcpServers": {
"jupyter": {
"command": "npx",
"args": ["@modelcontextprotocol/server-jupyter"],
"env": {
"JUPYTER_TOKEN": "${JUPYTER_TOKEN}"
}
}
}
}
Data Sources {
"mcpServers": {
"postgres": {
"command": "npx",
"args": ["@modelcontextprotocol/server-postgres", "${DATABASE_URL}"]
},
"bigquery": {
"command": "npx",
"args": ["@modelcontextprotocol/server-bigquery"],
"env": {
"GOOGLE_PROJECT_ID": "${GCP_PROJECT}"
}
}
}
}
Conclusion Google Antigravity is a powerful environment for Python and data science work. With Gemini 3, you can:
Generate boilerplate data processing code
Build ML pipelines faster
Debug complex data issues
Create visualizations on demand
Work seamlessly with Jupyter notebooks
Set up your Python environment
Install recommended extensions
Configure GEMINI.md for your project
Connect data sources via MCP
Data science made faster with AI assistance. Focus on insights, not boilerplate.