Welcome to Inverse AI!
AI alignment and interpretability
In the last decade, rapid advances in deep learning brought remarkable capabilities—yet also opaque “black-box” decisions, unfair bias, and safety hazards. The timeline below captures the landmark discoveries, tools, incidents, and regulations that drove the global push toward explainable, transparent, and aligned AI.
Nick Bostrom’s bestseller moves existential-risk and alignment concerns from academia to mainstream policy circles.
The scandal becomes a rallying cry for fairness metrics and post-hoc explanation tooling in vision models.
Ribeiro et al. open-source LIME; Microsoft’s Tay turns racist in 16 h; ProPublica reveals racial bias in COMPAS sentencing. XAI moves from research to necessity.
US DoD funds interpretable ML; Lundberg & Lee unveil game-theoretic SHAP, which soon becomes the industry default for local feature attributions.
Europe’s data-protection law codifies users’ right to understand automated decisions — driving corporate adoption of XAI dashboards.
IBM launches AIX360; Facebook releases Captum; Microsoft ships InterpretML. Amazon reveals biased recruiting AI & shutters the project.
Olah et al. dissect neurons & heads in GPT-2, kick-starting modern mechanistic interpretability. Google releases LIT for NLP inspection.
First comprehensive AI regulation demands transparency, human oversight, and technical documentation for high-risk systems.
OpenAI’s ChatGPT wows 100 M users but fabricates facts; Meta’s Galactica pulled after 48 h of false citations — a wake-up call for LLM transparency & safety.
Anthropic debuts Constitutional AI; OpenAI stakes 20 % compute on the Superalignment project, aiming to control super-human models by 2027.
Companies scramble to embed explainability & risk-management to meet 2026 enforcement; UAE issues updated AI Ethics Guidelines.
Anthropic & open-research partners publish a neuron-by-neuron atlas of Claude-3-Sonnet, proving large-scale transparency possible and setting a new bar for model auditing.