Hubert Pyskło

Hubert (Marek) Pyskło

I'm a Computational Sciences & History student at Minerva University (graduating May 2027), studying across San Francisco, Seoul, Taipei, Berlin, Hyderabad, Buenos Aires, and Tokyo. I build infrastructure for evaluating and training AI agents.

Last summer I interned as a Research Engineer at Wordware (YC S24), where I built eval infrastructure for AI agents, benchmarked retrieval approaches for agent memory, and developed a filesystem-based approach that outperformed vector databases. Before that I spent 4 months at Samsung Heavy Industries in South Korea, building an air-gapped RAG system for shipbuilding document analysis. I also built Agent Diff, an open-source platform for RL training and evaluation on APIs like Slack or Linear (arXiv, HuggingFace), and trained a Qwen 30B LoRA adapter via GRPO that doubled benchmark score — matching SOTA model performance.

Before the AI work I co-founded Econverse, CEE's student startup incubator — 3,500+ students, $500K+ raised from Microsoft, Google, and Baker McKenzie. I'm on the board of AI Consensus, where we ran responsible AI hackathons across Asia — including one in Korea with students from 23 countries, sponsored by AWS, Perplexity, and Upstage.

I like 20th-century history (Deng's reforms, the Cold War), poker (founded Minerva Poker Club), skiing, and shooting.


projects

Agent Diff — Research infrastructure for evaluating AI agents and RL training on replicas of 3rd-party APIs (Slack, Linear, Box, Google Calendar). 108 endpoints, 224 benchmark tasks, deterministic state-diff evaluation. Multi-tenant isolation via PostgreSQL schema-level sandboxing, snapshot-based diff engine for validating multi-step agent behaviours. Trained a LoRA adapter via GRPO on Slack + Linear tasks, improving eval scores from 0.31 to 0.59. Pre-print on arXiv, dataset on HuggingFace.
Python, PyTorch, PostgreSQL, SQLAlchemy, Starlette, TypeScript SDK


experience

2025 Research Engineer Intern at Wordware (YC S24). Designed evaluation frameworks, scoring metrics, and test suites with LLM-as-judge verification. Built automated Q&A test generation from company internal data. Integrated into CI/CD pipeline. Benchmarked retrieval architectures (vector DB, SQL, graph, filesystem) for agent memory.
2025 AI Engineer Intern at Samsung Heavy Industries, South Korea. Built air-gapped RAG system for shipbuilding ITT document analysis — no internet access, no GPT, just local inference. 92.5% accuracy across 217 risk factors.
2024– Board Member at AI Consensus. Previously Lead for Asia, built Taiwan's largest student AI conference with the Ministry of Digital Affairs. Ran responsible AI hackathon in Korea with students from 23 countries - partners included AWS, Perplexity, and Upstage.
2024 Visiting Associate at s20 VC. Due diligence and deal sourcing across e-commerce, circular economy, and AI tools.
2022–25 Co-Founder & VP at Econverse. Built CEE's largest student startup incubator — 3,500+ students across 4 countries, $500K+ raised from Microsoft, ABB, and National Development Bank. Nationwide AI education campaign reaching 70,000+ students.
2020–22 Co-Founder at Token Studio. Crypto investment analytics — $50k+ angel round from execs at Getin Noble Bank and BNP Paribas. Didn't find PMF, shut down.

education

2023–27 Minerva University, San Francisco — B.Sc. Computational Sciences, Minor in History.
2020–22 IB World School No. 1349, Poznań — International Baccalaureate, 41/45.

recognition


media