
AI Systems Architect
Added
11/24/2025
How Syndicated Job Posts Work
This Role is Closed
This is a Featured Job
Note: We've kept the name of the company private. If you'd like to know the company before requesting an intro, just email us at hello [at] fractionaljobs.io
Role Overview
Reins AI is hiring an Evaluation & Monitoring Architect to design the operating model for AI reliability in regulated domains. You’ll define the end-to-end architecture (agentic frameworks, evaluation data flows, observability integrations, scorecards, and release gates), and extend those same architectures into simulations: mirrored agentic environments that let us generate synthetic telemetry, run validators, and stress-test reliability loops before production. You’ll partner with our Adaptation and Product teams to ensure every improvement ships with regression coverage, measurable reliability gains, and reusable MCP-style services for clients.
Responsibilities
- System Architecture: Define the reference architecture for evaluation, monitoring, and agentic observability (ingest → evaluate → triage → verify → score → report).
- Evaluator Frameworks: Standardize evaluator patterns (task success, hallucination/factuality, adherence to procedure, suitability/reliability) with well-defined APIs and regression tests.
- Observability Integration: Integrate traces and metrics (LangSmith/OpenInference, OpenTelemetry, Arize, Grafana) with dashboards and SLOs for agentic and multi-agent systems.
- Scorecards & Gates: Establish reliability KPIs (pass-rates, variance, MTTR, calibration) and “ready-to-ship” gates; automate backtests and regressions.
- Workflows & Handoffs: Design triage queues, escalation paths, and ownership models so Delivery/Client teams can operate independently.
- Governance: Define test-set stewardship (golden traces, thresholds, update cadence), versioning, change logs, and audit trails.
- Enablement: Produce playbooks, runbooks, and quick-start guides; deliver internal/client training.
- Partnerships: Work with Test Designers (what to measure), Implementation Engineers (how it runs), and Adaptation (what to change) to close the loop.
Qualifications
- 6+ years in ML/AI evaluation, data/ML platform, or reliability/observability architecture.
- Strong Python + data engineering fundamentals; comfort with cloud (GCP/Azure), containers, CI/CD.
- Expertise with monitoring and tracing tools (LangSmith, OpenTelemetry, Arize, Grafana).
- Applied statistics for evaluation (sampling, confidence intervals, inter-rater agreement, calibration).
- Excellent systems thinking and cross-functional communication.
- Familiarity with multi-agent orchestration frameworks (LangGraph, Semantic Kernel, CrewAI, etc.).
Preferred Skills
- Background in regulated domains (audit, finance, healthcare).
- Experience with simulation or synthetic data generation.
- Familiarity with MCP frameworks or plugin-based service architectures.
- Understanding of agentic/HITL workflows and AI safety/reliability concerns.
Employment Details
This will start as a 4-6 month contract engagement (20 hours/week) with a clear path to full- time employment as we finalize 2026 project scopes. We’ll jointly evaluate fit, scope, and structure during that period.
Optimal start date: December 15, 2025
How to Apply
Note: This is a syndicated job post. Fractional Jobs found it on the web, but we are not working with the client directly, so we don't have control over or knowledge of the application process. To apply, click on the "View Application" button and follow the application's instructions. Let us know how it goes!
How to Get in Touch
Hit that "Request Intro" button below. Include any relevant links so we can get to know you better.
Your brief intro note should clearly address:
If we think there's a fit, we'll reach out to schedule an intro call. Looking forward!
MoreEngineeringJobs
Send fractional jobs,
playbooks, and more to
%20(1).webp)