Backstage demos beautifully and rots quietly. A platform team can have a portal running in an afternoon and a catalog full of stale, orphaned entities six months later. The thing that separates a useful internal developer portal from an abandoned tab is not the frontend; it is the discipline in the catalog model, the quality of the golden-path templates, and a docs pipeline that nobody has to think about. This is the build I reach for when a platform team has to stand up Backstage for real and keep it honest.
1. The architecture you are actually operating
Backstage is a monorepo of a React frontend app and a Node backend, glued by an app-config.yaml and a set of plugins. Since the deprecation of the legacy backend, the backend is a single process assembled from backend plugins and backend modules wired through a dependency-injection system. You do not edit a giant index.ts of createRouter calls anymore; you add() features to a createBackend() instance.
// packages/backend/src/index.ts (new backend system)
import { createBackend } from '@backstage/backend-defaults';
const backend = createBackend();
backend.add(import('@backstage/plugin-app-backend'));
backend.add(import('@backstage/plugin-catalog-backend'));
backend.add(import('@backstage/plugin-catalog-backend-module-github'));
backend.add(import('@backstage/plugin-scaffolder-backend'));
backend.add(import('@backstage/plugin-techdocs-backend'));
backend.add(import('@backstage/plugin-auth-backend'));
backend.add(import('@backstage/plugin-auth-backend-module-github-provider'));
backend.add(import('@backstage/plugin-permission-backend'));
backend.start();
The mental model that matters:
- Frontend plugins render under routes (
/catalog,/create,/docs). They talk to backend plugins over HTTP. - Backend plugins own a router and a slice of domain logic (catalog, scaffolder, techdocs).
- Backend modules extend a plugin without forking it — a GitHub entity provider is a module of the catalog plugin, a custom scaffolder action is a module of the scaffolder plugin. This is where almost all of your customization lives.
backstage.jsonpins the platform version. Every@backstage/*package is upgraded as a coherent set, never piecemeal.
Scaffold the app with the official CLI rather than cloning a starter — it gives you the current backend system out of the box:
npx @backstage/create-app@latest --path backstage
cd backstage
yarn install
yarn dev # frontend on :3000, backend on :7007
Resist the urge to fork and customize the frontend
App.tsxheavily on day one. The leverage in Backstage is in the catalog and the scaffolder, not in bespoke React. Keep the frontend close to stock so platform upgrades stay ayarn backstage-cli versions:bumpand not a merge conflict marathon.
2. Modeling the catalog: get the entity kinds right
The catalog is a graph of typed entities. Get the kinds and relations right and everything else (ownership, on-call routing, dependency views) falls out for free. Get them wrong and you have a glorified spreadsheet.
The kinds you will actually use:
| Kind | Models | Key relations |
|---|---|---|
Component |
A deployable unit of software (a service, a website, a library) | ownedBy, partOf System, providesApi, dependsOn |
API |
A boundary contract (OpenAPI, gRPC, AsyncAPI) | providedBy, consumedBy |
System |
A collection of components and resources with a shared purpose | hasPart, ownedBy |
Resource |
Infrastructure a component needs (a database, a bucket, a queue) | dependencyOf, partOf System |
Group / User |
Org structure, almost always synced from your IdP | memberOf, hasMember |
Domain |
A bounded area the business cares about | hasPart (Systems) |
The discipline: every Component has an owner that resolves to a real Group, and belongs to a System. An entity with owner: guests or no system is a smell. A catalog-info.yaml that earns its place:
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: checkout-api
title: Checkout API
description: Handles cart-to-order conversion and payment intents.
annotations:
github.com/project-slug: acme/checkout-api
backstage.io/techdocs-ref: dir:.
pagerduty.com/integration-key: <key>
tags:
- go
- tier1
spec:
type: service
lifecycle: production
owner: group:default/payments-team
system: commerce
providesApis:
- checkout-api-v2
dependsOn:
- resource:default/checkout-postgres
---
apiVersion: backstage.io/v1alpha1
kind: API
metadata:
name: checkout-api-v2
spec:
type: openapi
lifecycle: production
owner: group:default/payments-team
system: commerce
definition:
$text: ./openapi.yaml
The lifecycle field is not decoration — it drives filtering and lets you visually retire things. Use experimental, production, and deprecated consistently; a deprecated component that still shows as healthy is how teams keep calling dead APIs.
3. Ingesting entities: discovery, not registration-by-hand
Manually registering each catalog-info.yaml through the UI does not scale past a demo. The catalog ingests entities through providers and processors that run on an interval. The two patterns you want:
GitHub discovery crawls an org for catalog-info.yaml files and registers what it finds. Add the catalog module and configure discovery in app-config.yaml:
catalog:
providers:
github:
acmeOrg:
organization: 'acme'
catalogPath: '/catalog-info.yaml'
filters:
branch: 'main'
schedule:
frequency: { minutes: 30 }
timeout: { minutes: 3 }
Org ingestion pulls Users and Groups from your IdP so owner: group:default/payments-team actually resolves. The GitHub org provider mirrors teams and members:
catalog:
providers:
githubOrg:
acmeId:
id: production
githubUrl: 'https://github.com'
orgs: ['acme']
schedule:
frequency: { hours: 1 }
timeout: { minutes: 5 }
For Entra ID / Okta, swap in @backstage/plugin-catalog-backend-module-msgraph or the community Okta provider — the shape is identical, only the config block changes. The non-negotiable: org data comes from one authoritative source on a schedule, never hand-maintained YAML. A User that has left the company should disappear from the catalog within one sync, which means orphaned ownership surfaces automatically.
Static
catalog.locationsentries are fine for the handful of platform-owned entities (your Systems, Domains, and shared Resources). Everything a service team owns should arrive through discovery so the source of truth is the code repo, not the portal.
4. Authoring Software Templates: the golden path as code
A Software Template (kind: Template) is the scaffolder’s unit of work: a set of parameters rendered as a form, then a sequence of steps that run actions. The built-in actions cover most of the path — fetch a skeleton, template it with the form values, publish a repo, register it back into the catalog.
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: go-service
title: Go Service (golden path)
description: Provisions a production-ready Go service with CI, TechDocs, and catalog registration.
tags: [recommended, go]
spec:
owner: group:default/platform-team
type: service
parameters:
- title: Service details
required: [name, owner]
properties:
name:
title: Name
type: string
pattern: '^[a-z0-9-]+$'
owner:
title: Owner
type: string
ui:field: OwnerPicker
ui:options:
catalogFilter:
kind: Group
system:
title: System
type: string
ui:field: EntityPicker
ui:options:
catalogFilter:
kind: System
steps:
- id: fetch
name: Fetch skeleton
action: fetch:template
input:
url: ./skeleton
values:
name: ${{ parameters.name }}
owner: ${{ parameters.owner }}
system: ${{ parameters.system }}
- id: publish
name: Publish to GitHub
action: publish:github
input:
repoUrl: github.com?owner=acme&repo=${{ parameters.name }}
defaultBranch: main
repoVisibility: internal
protectDefaultBranch: true
- id: register
name: Register in catalog
action: catalog:register
input:
repoContentsUrl: ${{ steps.publish.output.repoContentsUrl }}
catalogInfoPath: '/catalog-info.yaml'
output:
links:
- title: Repository
url: ${{ steps.publish.output.remoteUrl }}
- title: Open in catalog
icon: catalog
entityRef: ${{ steps.register.output.entityRef }}
The skeleton directory is templated with Nunjucks — files ending in .njk (or any file containing ${{ }}) get the form values interpolated, including filenames. That is how catalog-info.yaml in the skeleton ends up correctly owned and systemed without the developer touching it.
Custom actions live in a backend module
When the built-in actions run out — you need to open a Jira ticket, register a service in PagerDuty, or call an internal provisioning API — you write a custom action as a backend module. Do not fork the scaffolder; extend it.
// plugins/scaffolder-backend-module-acme/src/actions/registerOnCall.ts
import { createTemplateAction } from '@backstage/plugin-scaffolder-node';
export const registerOnCallAction = () =>
createTemplateAction<{ service: string; team: string }>({
id: 'acme:oncall:register',
schema: {
input: {
type: 'object',
required: ['service', 'team'],
properties: {
service: { type: 'string' },
team: { type: 'string' },
},
},
},
async handler(ctx) {
const { service, team } = ctx.input;
ctx.logger.info(`Registering ${service} on-call for ${team}`);
// call your provisioning / PagerDuty API here
ctx.output('escalationPolicyId', 'PXXXXXX');
},
});
// register the action as a scaffolder module
import { createBackendModule } from '@backstage/backend-plugin-api';
import { scaffolderActionsExtensionPoint } from '@backstage/plugin-scaffolder-node/alpha';
import { registerOnCallAction } from './actions/registerOnCall';
export const scaffolderModuleAcme = createBackendModule({
pluginId: 'scaffolder',
moduleId: 'acme-actions',
register(env) {
env.registerInit({
deps: { scaffolder: scaffolderActionsExtensionPoint },
async init({ scaffolder }) {
scaffolder.addActions(registerOnCallAction());
},
});
},
});
Then backend.add(import('@internal/plugin-scaffolder-backend-module-acme')) in index.ts, and acme:oncall:register is available as a step.
5. Golden-path scaffolding that provisions real infrastructure
The template above creates a repo. A golden path provisions the whole vertical: repo, CI, and the cloud resources the service needs. Two patterns, and the choice matters.
Pattern A — the template commits IaC and lets your platform pipeline apply it. The skeleton ships a Terraform module that your CI plan/applies. This keeps a single source of truth (Git) and means the scaffolder never holds cloud credentials. It is the pattern I default to.
- id: terraform-pr
name: Open infra PR
action: publish:github:pull-request
input:
repoUrl: github.com?owner=acme&repo=infra-live
branchName: provision-${{ parameters.name }}
title: 'Provision infra for ${{ parameters.name }}'
description: 'Adds Postgres + bucket for ${{ parameters.name }}'
targetPath: 'services/${{ parameters.name }}'
sourcePath: ./infra-skeleton
The skeleton’s Terraform is plain HCL, templated with the service name:
module "db" {
source = "../../modules/postgres"
name = "${name}"
size = "small"
tier = "tier1"
}
Pattern B — the template calls a custom action that provisions synchronously. Faster feedback, but now the scaffolder needs cloud credentials and you own the rollback story when step 4 fails after step 3 created a database. Only reach for this when a same-session result is a hard requirement.
The failure mode that bites everyone: a multi-step template that is not idempotent. If
publish:githubsucceeds andcatalog:registerfails, re-running creates a second repo. Make custom actions idempotent (check-then-create), and prefer the PR pattern so a human merge is the commit point. Treat scaffolder runs as best-effort orchestration, not a transaction.
6. Publishing docs with TechDocs and a CI build pipeline
TechDocs renders MkDocs-built static sites inside Backstage, keyed off the backstage.io/techdocs-ref annotation. The decision that defines whether TechDocs scales is the generation strategy.
The local builder (Backstage generates docs on the fly) is fine for yarn dev and a disaster in production — every page view can trigger a build, and your portal needs MkDocs and the right plugins installed. The production pattern is build docs in CI, publish the static site to object storage, and configure TechDocs to read-only from there:
techdocs:
builder: 'external' # Backstage does not build; it serves
publisher:
type: 'awsS3'
awsS3:
bucketName: 'acme-techdocs'
region: 'eu-west-1'
Each repo builds and pushes its own docs on merge using the techdocs-cli:
# .github/workflows/techdocs.yml
name: TechDocs
on:
push:
branches: [main]
jobs:
publish:
runs-on: ubuntu-latest
permissions:
id-token: write
contents: read
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: 20 }
- run: npm i -g @techdocs/cli
- uses: actions/setup-python@v5
with: { python-version: '3.x' }
- run: pip install mkdocs-techdocs-core
- run: techdocs-cli generate --no-docker --verbose
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::111122223333:role/techdocs-publisher
aws-region: eu-west-1
- run: |
techdocs-cli publish \
--publisher-type awsS3 \
--storage-name acme-techdocs \
--entity default/component/${{ github.event.repository.name }}
The service repo carries an mkdocs.yml and a docs/ tree; the backstage.io/techdocs-ref: dir:. annotation tells Backstage where to look. Docs-as-code means the docs live beside the code, review in the same PR, and go stale visibly when the diff touches behavior but not docs/.
7. Authentication, the permissions framework, and RBAC
Authentication and authorization are two separate systems in Backstage, and conflating them is a classic mistake.
Auth signs the user in and resolves them to a catalog User entity. The sign-in resolver is the load-bearing piece — it maps an OAuth identity to a Backstage identity, which is what every ownership and permission check keys off:
// the resolver decides which catalog User an OAuth login maps to
import { githubAuthenticator } from '@backstage/plugin-auth-backend-module-github-provider';
// resolver: match the GitHub username to a User entity's name
signIn: {
resolver: async (info, ctx) =>
ctx.signInWithCatalogUser({ entityRef: { name: info.result.fullProfile.username } }),
}
Permissions decide what an authenticated user can do. The permission framework is opt-in: you write a PermissionPolicy and the backend enforces it at every plugin that emits permission checks (catalog, scaffolder). A policy that lets a user delete only entities they own and gates template execution by group:
// a permission policy keyed off ownership and group membership
import { PolicyDecision, AuthorizeResult } from '@backstage/plugin-permission-common';
import { catalogEntityDeletePermission } from '@backstage/plugin-catalog-common/alpha';
import { createCatalogConditionalDecision, catalogConditions }
from '@backstage/plugin-catalog-backend/alpha';
async handle(request, user): Promise<PolicyDecision> {
if (request.permission.name === catalogEntityDeletePermission.name) {
return createCatalogConditionalDecision(request.permission, {
rule: 'IS_ENTITY_OWNER',
resourceType: 'catalog-entity',
params: { claims: user?.identity.ownershipEntityRefs ?? [] },
});
}
return { result: AuthorizeResult.ALLOW };
}
The conditional decision is the powerful bit: instead of a flat allow/deny, the catalog applies the IS_ENTITY_OWNER rule as a filter, so the same policy works whether the user is acting on one entity or listing thousands. For role-based access at scale, the community RBAC plugin (@backstage-community/plugin-rbac) layers a managed UI and role/permission CSV on top of the framework so you are not redeploying for every policy tweak.
Start with authentication and ownership resolution working end to end before you turn on the permission policy. A broken sign-in resolver makes every user own nothing, and an ownership-based policy then denies everyone — which reads as “permissions are broken” when the real fault is one line in the resolver.
8. Production deployment and keeping the catalog from rotting
Backstage ships as a Docker image you build from the monorepo and run on Kubernetes. The production-grade choices:
- Postgres, not SQLite. SQLite is the dev default and loses your catalog on restart. Point
backend.databaseat managed Postgres. - Externalized config and secrets. Mount
app-config.production.yamland pull tokens from your secret manager; never bake them into the image. - Object storage for TechDocs, as in step 6, so the portal stays stateless and horizontally scalable.
# build the backend image from the repo root
yarn install --immutable
yarn tsc
yarn build:backend
docker build . -f packages/backend/Dockerfile --tag acme/backstage:$(git rev-parse --short HEAD)
# app-config.production.yaml
backend:
database:
client: pg
connection:
host: ${POSTGRES_HOST}
port: 5432
user: ${POSTGRES_USER}
password: ${POSTGRES_PASSWORD}
Upgrades are a recurring chore, not a project. The Backstage release cadence is roughly monthly, and the supported path is the version bump tool plus the published upgrade-helper diff:
yarn backstage-cli versions:bump
# then reconcile packages/app and packages/backend against the
# upgrade-helper diff for your from->to versions, run yarn dedupe
Falling more than a few releases behind turns a routine bump into an archaeology project — the new backend system, the alpha permission APIs, and the auth module split all landed across versions, and skipping them compounds the migration cost.
Keeping the catalog honest is the work that never ends and matters most. The mechanisms:
orphanStrategy: deleteso entities whose source location disappears are removed, not left as zombies.- Soft-delete and processing errors are surfaced per entity; wire
notification-controller-style alerts (or the Backstage notifications plugin) so an owner hears when their entity fails to process. - Entity validation in CI with
techdocs-cliand acatalog-info.yamlschema check, so a malformed descriptor fails the PR, not the nightly sync.
catalog:
orphanStrategy: delete
processingInterval: { minutes: 30 }
The catalog is a product with an SLA on freshness, not a wiki. If a service can ship without its catalog entry being correct, the catalog will be wrong, and a wrong catalog is worse than no catalog because people stop trusting it.
Enterprise scenario
A 600-engineer fintech rolled Backstage out as the mandated front door for service creation. Within a quarter the catalog showed 1,400 components against roughly 300 real services. The cause: their CI bot ran catalog:register on every feature branch that contained a catalog-info.yaml, and GitHub discovery was configured without a branch filter, so every long-lived branch minted a duplicate entity. Ownership views were unusable and the platform team was fielding “why are there nine checkout services” tickets weekly.
The constraint: they could not stop teams from working on branches, and they could not hand-prune 1,100 entities.
The fix was two lines of configuration and one policy. First, pin discovery to the default branch so only main produces catalog entities:
catalog:
providers:
github:
acmeOrg:
organization: 'acme'
catalogPath: '/catalog-info.yaml'
filters:
branch: 'main' # branches no longer mint entities
schedule:
frequency: { minutes: 30 }
timeout: { minutes: 3 }
orphanStrategy: delete # duplicates from old branches get reaped
Then they moved catalog:register out of per-branch CI entirely — registration became a one-time scaffolder output, and steady state came from discovery alone. With orphanStrategy: delete, the 1,100 branch-derived duplicates fell out of the catalog over the next two sync cycles without anyone deleting a thing. Component count settled at 312. The lesson the team wrote into their platform runbook: the catalog must have exactly one ingestion path per entity, and that path must be the default branch. Two paths is how you get duplicates; a non-default branch is how you get noise.
Verify
Confirm the portal works end to end before you call it done:
# 1. Backend health and catalog API responding
curl -s http://localhost:7007/api/catalog/entities | jq 'length'
# 2. Org data ingested — Users and Groups resolve
curl -s 'http://localhost:7007/api/catalog/entities?filter=kind=group' | jq '.[].metadata.name'
# 3. No entities stuck with processing errors
curl -s 'http://localhost:7007/api/catalog/entities' \
| jq '[.[] | select(.metadata.annotations["backstage.io/orphan"]=="true")] | length'
# 4. A scaffolder template renders its form and lists actions
curl -s http://localhost:7007/api/scaffolder/v2/actions | jq '.[].id' | grep acme
# 5. TechDocs is served from storage, not built locally
curl -s 'http://localhost:7007/api/techdocs/static/docs/default/component/checkout-api/index.html' -I
Then in the UI: load /create, run the golden-path template against a throwaway repo, and confirm the new component appears in /catalog with the right owner and System, and its docs render under /docs. If sign-in resolves you to a User and that user owns the entity you just created, the auth-to-ownership chain is sound.