! pip install altair
Altair: The interactive data visualization tool
Introduction
The Altair library was released in August 2016 by Jake VanderPlas, Brian Granger, and other contributors from the Jupyter and PyData communities. It is a declarative statistical visualization library for Python, meaning that we don’t need to code how to do something instead we need to just tell what to do, altair will automatically figure out how to do it. Altair library creates beautiful interactive large variety of graphs like simple bar chart, histograms, line plot, scatter plot, to more complex graphs like heatmaps, geographical maps, etc.
So, why do we need it? We can use the simply pandas and matplotlib libraries for creating these graphs.
Altair library makes the graph interactive which we can’t get from the above mentioned libraries or we can get them but it would require to code alot, with features like tooltips, zooming in and out of plots, selections with opacity, color, size (hover effect), dynamic update, also altair allows us to export the file into html files.
Installation & Setup:
Altair can be simply installed by running the following command on the terminal:
Step 1:
Using pip:
Using condas
! conda install -c conda-forge altair
Step 2:
Install a Renderer (For Jupyter, Colab, or VS Code), which is required for displaying charts, plots, etc.
For Jupyter Notebook
! pip install vega_datasets notebook vega
import altair as alt
'notebook') alt.renderers.enable(
For VS Code
import altair as alt
'mimetype') alt.renderers.enable(
For Google Colab
import altair as alt
'colab') alt.renderers.enable(
Key Features & Explanation with code examples:
1. Declarative Syntax and Conciseness
Altair allows us to write the code in declarative manner, that is we need to specify only what we want and not how we want, altair will figure it out at it’s own end. Also altair allows us to write the code in fewer lines due to its declarative syntax property.
import pandas as pd
import altair as alt
import matplotlib.pyplot as plt
= pd.DataFrame({
df 'Name': ['Ford', 'Toyota', 'BMW', 'Honda', 'Chevy'],
'Horsepower': [150, 120, 200, 130, 180],
'MPG': [20, 35, 25, 32, 22],
'Origin': ['USA', 'Japan', 'Europe', 'Japan', 'USA']
}) df
Name | Horsepower | MPG | Origin | |
---|---|---|---|---|
0 | Ford | 150 | 20 | USA |
1 | Toyota | 120 | 35 | Japan |
2 | BMW | 200 | 25 | Europe |
3 | Honda | 130 | 32 | Japan |
4 | Chevy | 180 | 22 | USA |
Using Matplotlib:
'Name'], df['Horsepower'])
plt.bar(df['Horsepower of Car')
plt.title('Name')
plt.xlabel('Horsepower')
plt.ylabel(=45)
plt.xticks(rotation plt.show()
Using altair:
= alt.Chart(df).mark_bar().encode(
chart ='Name', y='Horsepower'
x='Horsepower of Car')
).properties(title chart
2. Interactivity with graphs
Altair library makes the graph interactive-
- When we add the .interactive() parameter enabling us to zoom in and out of the graph and drag also.
- We can add brush with opacity to be able to highlight a particular section
- Use tooltips
print("\tTooltip")
from IPython.display import Image, display
="1.jpg")) display(Image(filename
Tooltip
print("\tHighlighting a specific section")
from IPython.display import Image, display
="2.jpg")) display(Image(filename
Highlighting a specific section
3. Data aggregation
In altair we need not process the data before, that is we don’t need to create a groupby object beforehand. But in matplotlib we first have to select the specific data using aggregation.
Matplotlib
= df.groupby('Origin')['MPG'].mean()
avgmpg =['blue', 'green', 'red'])
plt.bar(avgmpg.index, avgmpg.values, color'Average MPG by Origin')
plt.title('Origin')
plt.xlabel('Average MPG')
plt.ylabel( plt.show()
Altair
= alt.Chart(df).mark_bar().encode(
chart ='Origin', y=alt.Y('mean(MPG)'),
x='Origin'
color='Average MPG by Origin')
).properties(title chart
4. Handling Multiple plots
Altair allows us to plot multiple plots easily. It allows us to plot more than one parameters in a single plot just by using + operator, plot two graphs horizontally(side by side) by using “|” operator and vertically by using “&” operator, which is much more simpler than in matplotlib.
Matplotlib
import pandas as pd
import numpy as np
import altair as alt
import matplotlib.pyplot as plt
= np.linspace(0, 5, 100)
x_values = pd.DataFrame({
data 'x': x_values,
'y_x2': x_values**2,
'y_sqrt': np.sqrt(x_values)
})
Ploting multiple graphs on same plot
Matplotlib
=(8, 4))
plt.figure(figsize'x'], data['y_x2'], 'r-', label='y = x^2')
plt.plot(data['x'], data['y_sqrt'], 'b', label='y = sqrt(x)')
plt.plot(data[' y = x^2 and y = sqrt(x)')
plt.title('x')
plt.xlabel('y')
plt.ylabel(
plt.legend()True)
plt.grid(
plt.tight_layout() plt.show()
Altair
= alt.Chart(data).mark_line(color='red').encode(
x2 ='x', y='y_x2'
x=300, height=200)
).properties(width
= alt.Chart(data).mark_line(color='blue').encode(
sqrt ='x', y='y_sqrt'
x=300, height=200)
).properties(width+ sqrt).properties(title='y = x^2 and y = sqrt(x)') (x2
Ploting graphs side by side
Matplotlib
= plt.subplots(1, 2, figsize=(10, 4))
fig, (ax1, ax2) 'x'], data['y_x2'], 'r', label='y = x^2')
ax1.plot(data['y = x^2')
ax1.set_title('x')
ax1.set_xlabel('y')
ax1.set_ylabel(
ax1.legend()True)
ax1.grid(
'x'], data['y_sqrt'], 'b', label='y = sqrt(x)')
ax2.plot(data['y = sqrt(x)')
ax2.set_title('x')
ax2.set_xlabel('y')
ax2.set_ylabel(
ax2.legend()True)
ax2.grid(
plt.tight_layout() plt.show()
Altair
= alt.Chart(data).mark_line(color='red').encode(
x2 ='x', y='y_x2'
x=200, height=200, title='y = x^2')
).properties(width
= alt.Chart(data).mark_line(color='blue').encode(
sqrt ='x',
x='y_sqrt'
y=200, height=200, title='y = sqrt(x)')
).properties(width
| sqrt x2
Ploting graphs one above other
Matplotlib
= plt.subplots(2, 1, figsize=(6, 6))
fig, (ax1, ax2) 'x'], data['y_x2'], 'r-', label='y = x^2')
ax1.plot(data['y = x^2')
ax1.set_title('x')
ax1.set_xlabel('y')
ax1.set_ylabel(
ax1.legend()True)
ax1.grid(
'x'], data['y_sqrt'], 'b-', label='y = sqrt(x)')
ax2.plot(data['y = sqrt(x)')
ax2.set_title('x')
ax2.set_xlabel('y')
ax2.set_ylabel(
ax2.legend()True)
ax2.grid(
plt.tight_layout() plt.show()
Altair
= alt.Chart(data).mark_line(color='red').encode(
x2 ='x', y='y_x2'
x=300, height=150, title='y = x^2')
).properties(width
= alt.Chart(data).mark_line(color='blue').encode(
sqrt ='x', y='y_sqrt'
x=300, height=150, title='y = sqrt(x)')
).properties(width
& sqrt x2
USE CASES
Altair is widely used in different domains because of its interactive capabilities, declarative syntax, and smooth integration with Python data science tools. Here are some important areas where Altair proves to be especially useful:
1. Data Exploration & Analysis • Altair allows users to visualise distributions and relationships in datasets, making it perfect for rapid exploratory data analysis (EDA).
2. Interactive Dashboards & Reports • With Altair, users can create interactive plots that display dynamic distributions in environments like Jupyter Notebooks, Streamlit, or Voila.
3. Machine Learning & AI • It aids in important analyses such as visualizing model performance (e.g., confusion matrices, ROC curves) and understanding data distribution prior to model training.
4. Business Intelligence & Decision Making • Companies uses Altair Library for data-driven storytelling to extract meaningful insights from their data.
5. Realtime Data Monitoring Altair is capable to visualise streaming data which is useful in monitoring website traffic, server performance or stocks etc.
6. Geographic Data • Altair enables users geospatial plotting, making it valuable for mapping and location-based analytics.
CONCLUSION
Altair is an easy-to-use, high-powered, and versatile data science visualization library that’s built for contemporary data science workflows. It’s remarkable in many ways. Altair may not completely replace Matplotlib or Seaborn for highly customised visualisations, but it is ideal in scenarios where expressiveness, simplicity, and interactivity are important. For data analysts, machine learning practitioners or businesses Altair is a crucial tool to include in the data visualisation toolkit!
REFERENCES and FURTHER READINGS
Official Documentation:
https://altair-viz.github.io/user_guide/api.html#api
Video Tutorial:
1) https://www.youtube.com/watch?v=ms29ZPUKxbU by Jake VanderPlas
2) https://youtu.be/umTwkgQoo_E?si=DbeemSCUDsWo4mCX
Vega-Lite Docs:
https://vega.github.io/vega-lite-v2/docs/