Anton Rasmussen antonrasmussen

Hi, I’m Anton 👋

Senior Data Engineer | Healthcare AI | Trustworthy Data Systems

I design and build reliable data and AI systems for healthcare, research, and other high-stakes domains where trust matters.

Professionally, I’m a Senior Data Engineer with deep experience in data ingestion, analytics infrastructure, orchestration, and platform reliability. Independently, I’m building projects at the boundary of healthcare data engineering, applied AI, privacy-preserving systems, and human-centered tooling.

The common thread in my work is simple: complex systems should be understandable, reproducible, and useful to the people who depend on them.

I care about systems that:

work reliably at scale,
protect sensitive data,
make uncertainty visible,
and remain understandable to both technical and non-technical users.

🧠 Current Focus

I’m currently building independent projects around:

Trustworthy AI for healthcare and public health
Privacy-preserving data engineering
Edge-ready and local-first AI systems
Clinical and biomedical data workflows
Explainable, reproducible ML pipelines
Agentic tooling for data platforms and research workflows

I’m especially interested in the practical middle ground between production engineering and applied research: turning messy data, fragile workflows, and vague questions into systems that can be tested, explained, and improved.

A long-term thread across this work is Careful Intelligence — my personal effort to explore trustworthy, privacy-aware AI systems that are useful in the real world, not just impressive in demos.

🛠️ Technical Stack

Languages Python, SQL, Bash, Java, TypeScript, JavaScript (comfortable learning new languages when the system demands it)

Data Engineering & Orchestration Dagster, Apache Spark, dbt, Pandas, Hadoop ecosystem

Cloud & Storage Google Cloud Platform, GCS, BigQuery, Datastore, Azure SQL, SQL Server, Teradata, Informix

Analytics, Visualization & Geospatial Tableau, Vega-Lite, Matplotlib, GeoPandas, OpenRefine

AI / ML Areas I Work Around LLM evaluation, model calibration, quantization, prompt stability, de-identification, NER, reproducible experiment workflows

Practices I Care About Idempotent pipelines · data validation · observability · schema evolution · reproducibility · de-identification · operational clarity · human review loops

🏥 Professional Experience

Senior Data Engineer – Data Ingestion

Cityblock Health | Feb 2024 – Present

Design and evolve multi-source, multi-format ingestion pipelines for HIPAA-protected healthcare data
Lead and support migration toward Dagster-based orchestration, improving reliability, extensibility, and developer velocity
Build validation, profiling, quarantine, and review workflows that balance automation with human oversight
Collaborate across analytics, clinical, implementation, and platform teams to make complex data systems more usable and trustworthy
Work on operational visibility for ingestion workflows, including review paths for unknown, anomalous, or unconfigured inbound files

Data Engineer – Data Infrastructure

Cityblock Health | Feb 2022 – Feb 2024

Built robust, idempotent ETL pipelines supporting analytics, operations, and clinical workflows
Helped improve data quality, reliability, and maintainability across shared data infrastructure
Supported ingestion and transformation patterns for healthcare data from multiple external sources

Software Engineer III – Pharmacy Data & Analytics

Walmart Global Tech | Dec 2019 – Feb 2022

Developed large-scale analytics systems for pharmacy operations
Worked with big-data platforms and production streaming/batch pipelines
Built data-driven applications and dashboards supporting operational and analytical decision-making

📌 Selected Projects

Reliability of Quantized Biomedical LLMs

Independent continuation of graduate research exploring how quantization affects biomedical language model reliability. Focus areas include calibration error, prompt stability, macro-F1, and reproducible evaluation workflows for resource-constrained AI deployment.

Secure Healthcare Data Management Framework

Spark-based ingestion framework using NER-driven masking and de-identification for secure handling of clinical text and structured healthcare data.

Enhanced Ingestion & Validation Workflow

Dagster-orchestrated ingestion system with automated validation, quarantine paths, profiling, and review workflows for unconfigured or anomalous data.

Healthcare / Public Health Research Workflows

Independent research-oriented work exploring how healthcare data, social determinants of health, environmental signals, and explainable AI can be combined into practical, reproducible analysis workflows.

Pharmacy Analytics Dashboards

Interactive dashboards and data applications supporting pharmacy stakeholders with operational and analytical decision-making.

UAP Signal

Python CLI for tracking what is actually new in UAP/UFO-related releases and coverage. It pulls from official and news sources, applies rule-based source trust logic, uses LLMs for summaries and novelty scoring, and caches results locally in SQLite to reduce repeat API spend.

Molapse Recorder

Local-first vector stroke recorder for scientific drawings. Designed to record stylus strokes as structured vector data rather than pixels, then replay/export high-resolution transparent timelapses for use in video editing workflows. Currently in early capture-core development.

🎓 Education & Research Background

M.S. in Computer Science — Old Dominion University
Research and project interests:
- trustworthy AI infrastructure
- healthcare data systems
- privacy-preserving ML
- edge / local-first AI
- public health analytics
- biomedical LLM evaluation
- reproducible applied research workflows

I’m no longer focused on coursework. My attention now is on building a body of independent work: practical systems, research-informed prototypes, and technical writing that connect my healthcare data engineering background with the next generation of trustworthy AI tools.

🔍 Currently Exploring

Local-first and edge-ready AI deployment
Model compression, quantization, calibration, and inference efficiency
Agentic workflows for data engineering and research
Public health analytics using explainable and reproducible methods
Hardware tinkering: Arduino, sensors, instrumentation, and small systems
Algorithms, systems fundamentals, and infrastructure design
Security and privacy as design constraints, not afterthoughts

🧭 Personal Notes

Former Persian Linguist, U.S. Army 🇺🇸
Lifelong reader; happiest in libraries and used bookstores 📚
Drums, guitar, piano
Pinball, live music, comedy, and learning things the hard way

🤝 Let’s Connect

If you’re interested in:

healthcare data platforms,
trustworthy AI,
privacy-preserving systems,
biomedical or public health data workflows,
or building tools that people can actually understand and depend on,

feel free to explore my repositories or reach out.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly