
Reliability Data Scientist
Added
11/24/2025
How Syndicated Job Posts Work
This Role is Closed
This is a Featured Job
Note: We've kept the name of the company private. If you'd like to know the company before requesting an intro, just email us at hello [at] fractionaljobs.io
Role Overview
At Reins AI, data scientists define and operationalize how we measure reliability in real-world AI systems. You’ll bridge evaluation design and data analysis, crafting the test logic behind our reliability dashboards and weekly reports. Working across regulated audit and finance contexts, you’ll translate evaluation scenarios into structured metrics, visualizations, and summaries that help our clients see what’s working, what’s drifting, and what needs triage. You’ll collaborate closely with our Solutions Architect and Reliability Lead to connect monitoring data (Grafana, LangSmith, Arize) with simulations and context-engineering workflows, building the analytical backbone of AI Ops reporting.
Responsibilities
- Partner with domain and monitoring leads to define evaluation scenarios and metrics
(quality, suitability, reliability). - Build and maintain evaluation datasets, golden traces, and error taxonomies.
- Develop and maintain weekly reliability dashboards and summary reports (Grafana, Python,
SQL, or notebooks). - Analyze evaluation results for drift, outliers, and context-dependent failures; flag issues for
triage and verification loops. - Collaborate with engineers to automate scoring and aggregation pipelines.
- Validate evaluator reliability and calibration against human judgments.
- Document test logic, metric definitions, and interpretation guidance for repeatability.
- Support context-engineering workflows by designing metrics that measure predictability,
observability, and directability.
Qualifications
- 3-6 years in data science, analytics, or ML evaluation roles.
- Experience building dashboards and automated reports (Grafana, PowerBI, or similar).
- Strong Python, SQL, and data-wrangling skills.
- Familiarity with evaluation design concepts (sampling, calibration, pass/fail criteria).
- Excellent communication: can turn technical data into clear, decision-ready insights.
Preferred Skills
- Background in AI system monitoring, LLM evaluation, or reliability engineering.
- Familiarity with LangSmith, OpenInference, or similar tracing frameworks.
- Experience with synthetic or simulated data analysis.
- Understanding of regulated domains (audit, finance, healthcare).
Employment Details
This will start as a 4–6 month contract engagement (20 hours/week) with a clear path to full‐ time employment as we finalize 2026 project scopes. We’ll jointly evaluate fit, scope, and structure during that period.
Optimal start date: December 15, 2025
How to Apply
Note: This is a syndicated job post. Fractional Jobs found it on the web, but we are not working with the client directly, so we don't have control over or knowledge of the application process. To apply, click on the "View Application" button and follow the application's instructions. Let us know how it goes!
How to Get in Touch
Hit that "Request Intro" button below. Include any relevant links so we can get to know you better.
Your brief intro note should clearly address:
If we think there's a fit, we'll reach out to schedule an intro call. Looking forward!
MoreEngineeringJobs
Send fractional jobs,
playbooks, and more to
%20(1).webp)