On this publish I handle to get my open tabs all the way down to 1 — The one remaining one being this bedframe that I nonetheless can’t fairly deliver myself to shut. You’ll discover articles about economics, AI and a bunch of images of a celebration I’m throwing subsequent week. Take into account turning into a paying subscriber for normal updates on Claude code, causal inference, and random issues!
SBNation makes its picks for main NBA awards. Luca missed the 65 sport rule essential to get any when his season ended with an damage, however nonetheless, I’ll discover a strategy to get excited I’m positive.
Eight issues that emotionally safe {couples} say to 1 one other. They’re 1) something particular at present? 2) how can I present my appreciation for you? 3) would you like my opinion or would you like me to only pay attention? 4) How can I assist? 5) can we make a while to speak? 6) how do you assume we’re doing currently? 7) what can be enjoyable for us to do at present? 8) what bigger themes are you working by means of?
Gemini and NotebookLM continues to develop. Now they get notebooks. Right here’s one testimony of what it has changed for an individual. It’s apparently a useful improve.
One of many issues about on-line relationship period that I believe is telling is what number of issues that appear like traditionally regular patterns of relationship conduct turn out to be introduced as steady types of short-term relationships, or what my coauthors and I name “informal versus romantic”. The informal class is a continuing evolving dictionary of clearly outlined relationship ideas describing what’s principally simply variations on a one-sided type of short-term relationships, whereas the “romance” aspect is principally simply two issues: marriage versus cohabiting, with each being simply social expressions of 1 factor and that’s life companion. Right here’s one other article about what number of you’re only a short-term placeholder in your boo as they wait for somebody higher. You simply get the sense that the net relationship period is a wealthy and subtly distinct set of relationship classes, however moreso for the short-term matching than the extra semi-permanent ones, which to me is telling as that means one in every of them is the much more related equilibrium for these platforms (informal and short-term) than the opposite.
Refine.ink begins to get extra competitors, exhibiting me that downstream from the big AI oligopolies, the distributors who use LLMs are most likely dealing with few obstacles to entry. Actually, the recognition of vibe coding with brokers is principally an indication of that as a result of not even programming ability is required. Right here’s a assessment of this new one.
There’s one other by a Michigan Econ scholar too known as coarse.ink that’s borderline free and apparently additionally practically pretty much as good if not pretty much as good or higher than refine. Let a thousand instruments to assist refine our papers bloom!
For those who’re my age, Gen X, born circa mid-Nineteen Seventies, then you definately perhaps noticed Faces of Demise as a child. Which is astonishing in that case — that our mother and father seemed the opposite method whereas we checked out what felt like a snuff movie on the native VHS retailer, however I’m fairly positive at the least one in every of you, like me, noticed it at far too younger of an age for one thing so stunning. We didn’t have the web, so when the quilt mentioned it was movies of actual deaths, banned in dozens of nations, I believed it. And I nonetheless keep in mind to today earlier than it ended, perhaps 10-11yo, turning it off earlier than it ended, and working outdoors to play due to how completely terrified I used to be from a scene. That movie-documentary single-handedly concluded my curiosity in horror, which by no means to today got here again. And I nonetheless keep in mind in school, perhaps ten years later, randomly stumbling upon an article whereas within the UTK library that mentioned it was all pretend, and I used to be surprised. I’d believed for a decade I had personally witnessed many, many individuals being by some means murdered on digital camera solely to then be taught it was fully pretend. I used to be relieved and stunned on the similar time. Right here’s a narrative about how they pulled such a factor off.
Talking of scary, Anthropic’s new LLM mannequin — Mythos — continues to impress and frighten.
Boston’s marathon is Monday, and the end line is true down the road from me. Right here’s photos I took yesterday. It’s going to be a celebration. And I proceed to try to psychologically delay the unhappiness that at some point I’ll now not reside right here.
I’m throwing a celebration for a pal and their buddies on Monday and selected to share some texas with them by ordering 7 kilos of bbq brisket, 4 kilos of bbq hen, from Texas Month-to-month’s beloved Waco star bbq institution, Helberg. My cargo received right here yesterday, and right here is me unboxing it excitedly. I made a decision to provide my new pal and their buddies a celebration sharing my beloved texas with them with brisket tacos, guac, and queso, in addition to margaritas, and mock-aritos.
And listed here are among the issues I want for my margarita recipe.
After which right here’s is the calm earlier than the storm for the tacos and chips are laid out, together with Spanish themes runners that I unfold out everywhere in the room.
I’ll miss this lovely metropolis, its lovely inhabitants, and Harvard. What a life altering yr it’s been, and a present.
And with that I bid adieu. Wishing all of the marathoners right here on the town studying this my greatest. Perhaps I’ll see you cross the end line. Legends!
The Precisely as Designed. The Reply Was Nonetheless Unsuitable.
I need to inform you in regards to the second I ended trusting retrieval scores.
I used to be operating a question in opposition to a data base I had constructed fastidiously. Good chunking. Hybrid search. Reranking. The highest-k paperwork got here again with cosine similarities as excessive as 0.86. Each indicator mentioned the pipeline was working. I handed these paperwork to a QA mannequin, received a assured reply, and moved on.
The reply was flawed.
Not hallucinated-wrong. Not retrieval-failed-wrong. The precise paperwork had come again. Each of them. A preliminary earnings determine and the audited revision that outmoded it, sitting aspect by aspect in the identical context window. The mannequin learn each, selected one, and reported it with 80% confidence. It had no mechanism to inform me it had been requested to referee a dispute it was by no means designed to guage.
That’s the failure mode this text is about. It doesn’t present up in your retrieval metrics. It doesn’t set off your hallucination detectors. It lives within the hole between context meeting and technology — the one step within the RAG pipeline that just about no person evaluates.
I constructed a reproducible experiment to isolate it. Every part on this article runs on a CPU in about 220 MB. No API key. No cloud. No GPU. The output you see within the terminal screenshots is unmodified.
The setup is intentionally scientific. Three questions. One data base containing three conflicting doc pairs that make immediately contradictory claims about the identical reality. Retrieval is tuned to return each conflicting paperwork each time.
The query is just not whether or not retrieval works. It does. The query is: what does the mannequin do once you hand it a contradictory transient and ask it to reply with confidence?
The reply, as you will note, is that it picks a aspect. Silently. Confidently. With out telling you it had a option to make.
RAG techniques can retrieve the suitable paperwork however nonetheless produce incorrect solutions on account of hidden conflicts throughout context meeting. Picture by Creator.
Three Situations, Every Drawn from Manufacturing
State of affairs A — The restatement no person informed the mannequin about
An organization’s This autumn earnings launch reviews annual income of $4.2M for fiscal yr 2023. Three months later, exterior auditors restate that determine to $6.8M. Each paperwork dwell within the data base. Each are listed. When somebody asks “What was Acme Corp’s income for fiscal yr 2023?” — each come again, with similarity scores of 0.863 and 0.820 respectively.
The mannequin solutions $4.2M.
It selected the preliminary determine over the audited revision as a result of the preliminary doc scored marginally greater in retrieval. Nothing in regards to the reply indicators {that a} extra authoritative supply disagreed.
State of affairs B — The coverage replace that arrived too late
A June 2023 HR coverage mandates three days per week in-office. A November 2023 revision explicitly reverses it — totally distant is now permitted. Each paperwork are retrieved (similarity scores 0.806 and 0.776) when an worker asks in regards to the present distant work coverage.
The mannequin solutions with the June coverage. The stricter, older rule. The one which now not applies.
State of affairs C — The API docs that by no means received deprecated
Model 1.2 of an API reference states a charge restrict of 100 requests per minute. Model 2.0, printed after an infrastructure improve, raises it to 500. Each are retrieved (scores 0.788 and 0.732).
The mannequin solutions 100. A developer utilizing this reply to configure their charge limiter will throttle themselves to one-fifth of their precise allowance.
None of those are edge circumstances. Each manufacturing data base accumulates precisely these patterns over time: monetary restatements, coverage revisions, versioned documentation. The pipeline has no layer that detects or handles them.
Working the Experiment
pip set up -r necessities.txt
python rag_conflict_demo.py
Two fashions. One for embeddings, one for extractive QA. Each obtain robotically on first run and cache domestically. Complete: ~220 MB. No authentication required.
Part 1: What Naive RAG Does
Right here is the unmodified terminal output from Part 1 — normal RAG with no battle dealing with:
────────────────────────────────────────────────────────────────────
NAIVE | State of affairs A — Numerical Battle
────────────────────────────────────────────────────────────────────
Question : What was Acme Corp's annual income for fiscal yr 2023?
Reply : $4.2M
Confidence : 80.3%
Battle : YES — see warning
Sources retrieved
[0.863] This autumn-2023-Earnings-Launch (2024-01-15)
[0.820] 2023-Annual-Report-Revised (2024-04-03)
[0.589] Firm-Overview-2024 (2024-01-01)
Battle pairs
fin-001 ↔ fin-002
numerical contradiction (topic_sim=0.83)
[Q4-2023-Earnings-Release: {'$4.2M'}] vs [2023-Annual-Report-Revised: {'$6.8M'}]
────────────────────────────────────────────────────────────────────
────────────────────────────────────────────────────────────────────
NAIVE | State of affairs B — Coverage Battle
────────────────────────────────────────────────────────────────────
Question : What's the present distant work coverage for workers?
Reply : all staff are required to be current within the workplace
a minimal of three days per week
Confidence : 78.3%
Battle : YES — see warning
Sources retrieved
[0.806] HR-Coverage-June-2023 (2023-06-01)
[0.776] HR-Coverage-November-2023 (2023-11-15)
[0.196] HR-Coverage-November-2023 (2023-11-15)
────────────────────────────────────────────────────────────────────
────────────────────────────────────────────────────────────────────
NAIVE | State of affairs C — Technical Battle
────────────────────────────────────────────────────────────────────
Question : What's the API charge restrict for the usual tier?
Reply : 100 requests per minute
Confidence : 81.0%
Battle : YES — see warning
Sources retrieved
[0.788] API-Reference-v1.2 (2023-02-10)
[0.732] API-Reference-v2.0 (2023-09-20)
[0.383] API-Reference-v2.0 (2023-09-20)
────────────────────────────────────────────────────────────────────
Retrieval succeeded each time. The QA mannequin nonetheless answered from whichever conflicting doc it attended to most — silently and confidently. Picture by Creator.
Three questions. Three flawed solutions. Confidence between 78% and 81% on each one among them.
Discover what is going on within the logs earlier than every response:
The conflicts are detected. They’re logged. After which, as a result of resolve_conflicts=False, the pipeline passes the total contradictory context to the mannequin and solutions anyway. That warning goes nowhere. In a manufacturing system with no battle detection layer, you wouldn’t even get the warning.
Why the Mannequin Behaves This Method
This requires a second of rationalization, as a result of the mannequin is just not damaged. It’s doing precisely what it was skilled to do.
deepset/minilm-uncased-squad2 is an extractive QA mannequin. It reads a context string and selects the span with the best mixed start-logit and end-logit rating. It has no output class for “I see two contradictory claims.” When the context incorporates each $4.2M and $6.8M, the mannequin computes token-level scores throughout your complete string and selects whichever span wins.
That choice is pushed by components that don’t have anything to do with correctness [8]. The 2 major drivers are:
Place bias. Earlier spans within the context obtain marginally greater consideration scores because of the encoder structure. The preliminary doc ranked greater in retrieval and due to this fact appeared first.
Language energy. Direct declarative statements (“income of $4.2M”) outscore hedged or conditional phrasing (“following restatement… is $6.8M”).
A 3rd contributing issue is lexical alignment — spans whose vocabulary overlaps extra intently with the query tokens rating greater no matter whether or not the underlying declare is present or authoritative.
Critically, what the mannequin does not take into account in any respect: supply date, doc authority, audit standing, or whether or not one declare supersedes one other. These indicators are merely invisible to the extractive mannequin.
The mannequin has no mechanism to weigh supply date or audit authority. It picks the span with the best confidence rating — and place wins. Picture by Creator.
The identical dynamic performs out in generative LLMs, however much less visibly — the mannequin paraphrases relatively than extracting verbatim spans, so the flawed reply is wearing fluent prose. The mechanism is identical. Joren et al. (2025) display at ICLR 2025 that frontier fashions together with Gemini 1.5 Professional, GPT-4o, and Claude 3.5 regularly produce incorrect solutions relatively than abstaining when retrieved context is inadequate to reply the question — and that this failure is just not mirrored within the mannequin’s expressed confidence.
The failure is just not a mannequin deficiency. It’s an architectural hole: the pipeline has no stage that detects contradictions earlier than handing context to technology.
Constructing the Battle Detection Layer
A modular RAG pipeline structure displaying doc ingestion, embedding-based retrieval, battle detection, QA processing, and structured response technology. Picture by Creator.
The detector sits between retrieval and technology. It examines each pair of retrieved paperwork and flags contradictions earlier than the QA mannequin sees the context. Crucially, embeddings for all retrieved paperwork are computed in a single batched ahead cross earlier than pair comparability begins — every doc is encoded precisely as soon as, no matter what number of pairs it participates in.
Two heuristics do the work.
Heuristic 1: Numerical Contradiction
Two topic-similar paperwork that include non-overlapping significant numbers are flagged. The implementation filters out years (1900–2099) and naked small integers (1–9), which seem ubiquitously in enterprise textual content and would generate fixed false positives if handled as declare values.
@classmethod
def _extract_meaningful_numbers(cls, textual content: str) -> set[str]:
outcomes = set()
for m in cls._NUM_RE.finditer(textual content):
uncooked = m.group().strip()
numeric_core = re.sub(r"[$€£MBK%,]", "", uncooked, flags=re.IGNORECASE).strip()
attempt:
val = float(numeric_core)
besides ValueError:
proceed
if 1900 <= val <= 2099 and "." not in numeric_core:
proceed # skip years
if val < 10 and re.fullmatch(r"d+", uncooked):
proceed # skip naked small integers
outcomes.add(uncooked)
return outcomes
Utilized to State of affairs A: fin-001 yields {'$4.2M'}, fin-002 yields {'$6.8M'}. Empty intersection — battle detected.
Heuristic 2: Contradiction Sign Asymmetry
Two paperwork discussing the identical matter, the place one incorporates contradiction tokens the opposite doesn’t, are flagged. The token set splits into two teams saved as separate frozenset objects:
_NEGATION_TOKENS: “not”, “by no means”, “no”, “can not”, “doesn’t”, “isn’t”, and associated varieties
These are unioned into CONTRADICTION_SIGNALS. Conserving them separate makes domain-specific tuning simple — a authorized corpus may want a broader negation set; a changelog corpus may want extra directional tokens.
Utilized to State of affairs B: hr-002 incorporates “no” (from “now not required”); hr-001 doesn’t. Asymmetry detected. Utilized to State of affairs C: api-002 incorporates “elevated”; api-001 doesn’t. Asymmetry detected.
Each heuristics require topic_sim >= 0.68 earlier than firing. This threshold gates out unrelated paperwork that occur to share a quantity or a negation phrase. The 0.68 worth was calibrated for this doc set with all-MiniLM-L6-v2 — deal with it as a place to begin, not a common fixed. Totally different embedding fashions and completely different domains would require recalibration.
The Decision Technique: Cluster-Conscious Recency
When conflicts are detected, the pipeline resolves them by holding probably the most just lately timestamped doc from every battle cluster. The important thing design choice is cluster-aware.
A top-k end result might include a number of unbiased battle clusters — two monetary paperwork disagreeing on income and two API paperwork disagreeing on charge limits, all in the identical top-3 end result. A naive strategy — maintain solely the only most up-to-date doc from the mixed conflicting set — would silently discard the profitable doc from each cluster besides probably the most just lately printed one general.
As a substitute, the implementation builds a battle graph, finds related parts through iterative DFS, and resolves every element independently:
@staticmethod
def _resolve_by_recency(
contexts: record[RetrievedContext],
battle: ConflictReport,
) -> record[RetrievedContext]:
# Construct adjacency record
adj: dict[str, set[str]] = defaultdict(set)
for a_id, b_id in battle.conflict_pairs:
adj[a_id].add(b_id)
adj[b_id].add(a_id)
# Related parts through iterative DFS
visited: set[str] = set()
clusters: record[set[str]] = []
for begin in adj:
if begin not in visited:
cluster: set[str] = set()
stack = [start]
whereas stack:
node = stack.pop()
if node not in visited:
visited.add(node)
cluster.add(node)
stack.lengthen(adj[node] - visited)
clusters.append(cluster)
all_conflicting_ids = set().union(*clusters) if clusters else set()
non_conflicting = [c for c in contexts if c.document.doc_id not in all_conflicting_ids]
resolved_docs = []
for cluster in clusters:
cluster_ctxs = [c for c in contexts if c.document.doc_id in cluster]
# ISO-8601 timestamps type lexicographically — max() provides most up-to-date
greatest = max(cluster_ctxs, key=lambda c: c.doc.timestamp)
resolved_docs.append(greatest)
return non_conflicting + resolved_docs
Non-conflicting paperwork cross via unchanged. Every battle cluster contributes precisely one winner.
Part 2: What Battle-Conscious RAG Does
────────────────────────────────────────────────────────────────────
RESOLVED | State of affairs A — Numerical Battle
────────────────────────────────────────────────────────────────────
Question : What was Acme Corp's annual income for fiscal yr 2023?
Reply : $6.8M
Confidence : 79.6%
Battle : RESOLVED
⚠ Conflicting sources detected — reply derived from most up-to-date
doc per battle cluster.
Sources retrieved
[0.820] 2023-Annual-Report-Revised (2024-04-03)
[0.589] Firm-Overview-2024 (2024-01-01)
Battle cluster resolved: saved '2023-Annual-Report-Revised' (2024-04-03),
discarded 1 older doc(s).
────────────────────────────────────────────────────────────────────
────────────────────────────────────────────────────────────────────
RESOLVED | State of affairs B — Coverage Battle
────────────────────────────────────────────────────────────────────
Reply : staff are now not required to keep up
a set in-office schedule
Confidence : 78.0%
Battle : RESOLVED
Battle cluster resolved: saved 'HR-Coverage-November-2023' (2023-11-15),
discarded 1 older doc(s).
────────────────────────────────────────────────────────────────────
────────────────────────────────────────────────────────────────────
RESOLVED | State of affairs C — Technical Battle
────────────────────────────────────────────────────────────────────
Reply : 500 requests per minute
Confidence : 80.9%
Battle : RESOLVED
Battle cluster resolved: saved 'API-Reference-v2.0' (2023-09-20),
discarded 1 older doc(s).
────────────────────────────────────────────────────────────────────
A conflict-aware RAG system resolves contradictions in retrieved paperwork and produces appropriate, up-to-date solutions throughout monetary, HR, and API queries. Picture by Creator.
Three questions. Three appropriate solutions. The arrogance scores are virtually similar to Part 1 — 78–81% — which underscores the unique level: confidence was by no means the sign that one thing had gone flawed. It nonetheless is just not. The one factor that modified is the structure.
Similar retriever, identical mannequin, identical question. The one distinction is whether or not battle detection runs earlier than context is handed to the QA mannequin. Picture by Creator.
What the Heuristics Can not Catch
I need to be exact in regards to the failure envelope, as a result of a way that understates its personal limitations is just not helpful.
Paraphrased conflicts. The heuristics catch numerical variations and express contradiction tokens. They won’t catch “the service was retired” versus “the service is at present out there.” That may be a actual battle with no numeric distinction and no negation token. For these, a Pure Language Inference mannequin — cross-encoder/nli-deberta-v3-small at ~80 MB — can rating entailment versus contradiction between sentence pairs. That is the extra strong path described within the tutorial literature (Asai et al., 2023), and the ConflictDetector class is designed to be prolonged on the _pair_conflict_reason methodology for precisely this objective.
Non-temporal conflicts. Recency-based decision is suitable for versioned paperwork and coverage updates. It isn’t applicable for knowledgeable opinion disagreements (the minority view could also be appropriate), cross-methodology knowledge conflicts (recency is irrelevant), or multi-perspective queries (the place surfacing each views is the suitable response). In these circumstances, the ConflictReport knowledge construction gives the uncooked materials to construct a distinct response — surfacing each claims, flagging for human assessment, or asking the consumer for clarification.
Scale. Pair comparability is O(k²) in retrieved paperwork. For ok=3 that is trivial; for ok=20 it’s nonetheless high quality. For pipelines retrieving ok=100 or extra, pre-indexing recognized battle pairs or cluster-based detection turns into vital.
The place the Analysis Neighborhood Is Taking This
What you’ve gotten seen here’s a sensible heuristic approximation of an issue that lively analysis is attacking at a way more refined degree.
Cattan et al. (2025) launched the CONFLICTS benchmark — the primary particularly designed to trace how fashions deal with data conflicts in sensible RAG settings. Their taxonomy identifies 4 battle classes — freshness, conflicting opinions, complementary data, and misinformation — every requiring distinct mannequin behaviour. Their experiments present that LLMs regularly fail to resolve conflicts appropriately throughout all classes, and that explicitly prompting fashions to cause about potential conflicts considerably improves response high quality, although substantial room for enchancment stays.
Ye et al. (2026) launched TCR (Clear Battle Decision), a plug-and-play framework that disentangles semantic relevance from factual consistency through twin contrastive encoders. Self-answerability estimation gauges confidence within the mannequin’s parametric reminiscence, and the ensuing scalar indicators are injected into the generator through light-weight soft-prompt tuning. Throughout seven benchmarks, TCR improves battle detection by 5–18 F1 factors whereas including solely 0.3% parameters.
Gao et al. (2025) launched CLEAR (Battle-Localized and Enhanced Consideration for RAG), which probes LLM hidden states on the sentence illustration degree to detect the place conflicting data manifests internally. Their evaluation reveals that data integration happens hierarchically and that conflicting versus aligned data reveals distinct distributional patterns inside sentence-level representations. CLEAR makes use of these indicators for conflict-aware fine-tuning that guides the mannequin towards correct proof integration.
The constant discovering throughout all of this work matches what this experiment demonstrates immediately: retrieval high quality and reply high quality are distinct dimensions, and the hole between them is bigger than the group has traditionally acknowledged.
The distinction between that analysis and this text is 220 MB and no authentication.
What You Ought to Truly Do With This
1. Add a battle detection layer earlier than technology. The ConflictDetector class is designed to drop into an present pipeline on the level the place you assemble your context string. Even the 2 easy heuristics right here will catch the patterns that seem most frequently in enterprise corpora: restatements, coverage updates, versioned documentation.
2. Distinguish battle sorts earlier than resolving. A temporal battle (use the newer doc) is a distinct downside from a factual dispute (flag for human assessment) or an opinion battle (floor each views). A single decision technique utilized blindly creates new failure modes.
3. Log each ConflictReport. After every week of manufacturing site visitors you’ll understand how usually your particular corpus generates conflicting retrieved units, which doc pairs battle most regularly, and what question patterns set off conflicts. That knowledge is extra actionable than any artificial benchmark.
4. Floor uncertainty once you can not resolve it. The precise reply to an unresolvable battle is to not decide one and conceal the selection. The warning subject in RAGResponse is there exactly to help responses like: “I discovered conflicting data on this matter. The June 2023 coverage states X; the November 2023 replace states Y. The November doc is newer.”
Working the Full Demo
# Full output with INFO logs
python rag_conflict_demo.py
# Demo output solely (suppress mannequin loading logs)
python rag_conflict_demo.py --quiet
# Run unit assessments with out downloading fashions
python rag_conflict_demo.py --test
# Plain terminal output for log seize / CI
python rag_conflict_demo.py --no-color
All output proven on this article is unmodified output from an area Home windows machine operating Python 3.9+ in a digital atmosphere. The code and output are totally reproducible by any reader with the listed dependencies put in.
The Takeaway
The retrieval downside is essentially solved. Vector search is quick, correct, and well-understood. The group has spent years optimising it.
The context-assembly downside is just not solved. No one is measuring it. The hole between “appropriate paperwork retrieved” and “appropriate reply produced” is actual, it’s common, and it produces assured flawed solutions with no sign that something went flawed.
The repair doesn’t require a bigger mannequin, a brand new structure, or extra coaching. It requires one extra pipeline stage, operating on embeddings you have already got, at zero marginal latency.
The experiment above runs in about thirty seconds on a laptop computer. The query is whether or not your manufacturing system has the equal layer — and if not, what it’s silently answering flawed proper now.
References
[1] Ye, H., Chen, S., Zhong, Z., Xiao, C., Zhang, H., Wu, Y., & Shen, F. (2026). Seeing via the battle: Clear data battle dealing with in retrieval-augmented technology. arXiv:2601.06842. https://doi.org/10.48550/arXiv.2601.06842
[2] Asai, A., Wu, Z., Wang, Y., Sil, A., & Hajishirzi, H. (2023). Self-RAG: Studying to retrieve, generate, and critique via self-reflection. arXiv:2310.11511. https://doi.org/10.48550/arXiv.2310.11511
[3] Cattan, A., Jacovi, A., Ram, O., Herzig, J., Aharoni, R., Goldshtein, S., Ofek, E., Szpektor, I., & Caciularu, A. (2025). DRAGged into conflicts: Detecting and addressing conflicting sources in search-augmented LLMs. arXiv:2506.08500. https://doi.org/10.48550/arXiv.2506.08500
[4] Gao, L., Bi, B., Yuan, Z., Wang, L., Chen, Z., Wei, Z., Liu, S., Zhang, Q., & Su, J. (2025). Probing latent data battle for devoted retrieval-augmented technology. arXiv:2510.12460. https://doi.org/10.48550/arXiv.2510.12460
[5] Jin, Z., Cao, P., Chen, Y., Liu, Okay., Jiang, X., Xu, J., Li, Q., & Zhao, J. (2024). Tug-of-war between data: Exploring and resolving data conflicts in retrieval-augmented language fashions. arXiv:2402.14409. https://doi.org/10.48550/arXiv.2402.14409
[6] Joren, H., Zhang, J., Ferng, C.-S., Juan, D.-C., Taly, A., & Rashtchian, C. (2025). Ample context: A brand new lens on retrieval augmented technology techniques. arXiv:2411.06037. https://doi.org/10.48550/arXiv.2411.06037
[7] Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., … & Kiela, D. (2020). Retrieval-augmented technology for knowledge-intensive NLP duties. arXiv:2005.11401. https://doi.org/10.48550/arXiv.2005.11401
[8] Mallen, A., Asai, A., Zhong, V., Das, R., Khashabi, D., & Hajishirzi, H. (2023). When to not belief language fashions: Investigating effectiveness of parametric and non-parametric reminiscences. arXiv:2212.10511. https://doi.org/10.48550/arXiv.2212.10511
[10] Xu, R., Qi, Z., Guo, Z., Wang, C., Wang, H., Zhang, Y., & Xu, W. (2024). Data conflicts for LLMs: A survey. arXiv:2403.08319. https://doi.org/10.48550/arXiv.2403.08319
[11] Xie, J., Zhang, Okay., Chen, J., Lou, R., & Su, Y. (2023). Adaptive chameleon or cussed sloth: Revealing the habits of huge language fashions in data conflicts. arXiv:2305.13300. https://doi.org/10.48550/arXiv.2305.13300
Each fashions obtain robotically on first run and cache domestically. No API key or HuggingFace authentication is required.
Disclosure
All code was written, debugged, and validated by the writer via a number of iterations of actual execution. All terminal output on this article is unmodified output from an area Home windows machine operating Python 3.9+ in a digital atmosphere. The code and output are totally reproducible by any reader with the listed dependencies put in.
The writer has no monetary relationship with Hugging Face, deepset, or any organisation referenced on this article. Mannequin and library selections had been made solely on the premise of measurement, licence, and CPU compatibility.
Most AI brokers fail due to a spot between design intent and manufacturing actuality. Builders usually spend days constructing solely to seek out that escalation logic or device calls fail within the wild, forcing a complete restart. DataRobot Agent Help closes this hole. It’s a pure language CLI device that permits you to design, simulate, and validate your agent’s habits in “rehearsal mode” earlier than you write any implementation code. This weblog will present you the right way to execute the complete agent lifecycle from logic design to deployment inside a single terminal session, saving you additional steps, rework, and time.
rapidly develop and ship an agent from a CLI
DataRobot’s Agent Help is a CLI device constructed for designing, constructing, simulating, and transport manufacturing AI brokers. You run it out of your terminal, describe in pure language what you need to construct, and it guides the complete journey from thought to deployed agent, with out switching contexts, instruments, or environments.
It really works standalone and integrates with the DataRobot Agent Workforce Platform for deployment, governance, and monitoring. Whether or not you’re a solo developer prototyping a brand new agent or an enterprise workforce transport to manufacturing, the workflow is identical: design, simulate, construct, deploy.
Customers are going from thought to a working agent rapidly, lowering the scaffolding and setup time from days to minutes.
Why not simply use a general-purpose coding agent?
Common AI coding brokers are constructed for breadth. That breadth is their power, however it’s precisely why they fall brief for manufacturing AI brokers.
Agent Help was constructed for one factor: AI brokers. That focus shapes each a part of the device. The design dialog, the spec format, the rehearsal system, the scaffolding, and the deployment are all purpose-built for the way brokers really work. It understands device definitions natively. It is aware of what a production-grade agent wants structurally earlier than you inform it. It might simulate habits as a result of it was designed to consider brokers finish to finish.
Agent Help in comparison with generic AI coding instruments
The agent constructing journey: from dialog to manufacturing
Step 1: Begin designing your agent with a dialog
You open your terminal and run dr help. No mission setup, no config information, no templates to fill out. You’ll instantly get a immediate asking what you need to construct.
Agent Help asks follow-up questions, not solely technical ones, however enterprise ones too. What methods does it want entry to? What does a great escalation seem like versus an pointless one? How ought to it deal with a annoyed buyer otherwise from somebody with a easy query?
Guided questions and prompts will assist with constructing an entire image of the logic, not simply gathering an inventory of necessities. You may preserve refining your concepts for the agent’s logic and habits in the identical dialog. Add a functionality, change the escalation guidelines, modify the tone. The context carries ahead and the whole lot updates mechanically.
For builders who need fine-grained management, Agent Help additionally offers configuration choices for mannequin choice, device definitions, authentication setup, and integration configuration, all generated instantly from the design dialog.
When the image is full, Agent Help generates a full specification: system immediate, mannequin choice, device definitions, authentication setup, and integration configuration. One thing a developer can construct from and a enterprise stakeholder can really evaluation earlier than any code exists. From there, that spec turns into the enter to the following step: working your agent in rehearsal mode, earlier than a single line of implementation code is written.
Step 2: Watch your agent run earlier than you construct it
That is the place Agent Help does one thing no different device does.
Earlier than writing any implementation, it runs your agent in rehearsal mode. You describe a state of affairs and it executes device calls in opposition to your precise necessities, exhibiting you precisely how the agent would behave. You see each device that fires, each API name that will get made, each determination the agent takes.
If the escalation logic is fallacious, you catch it right here. If a device returns information in an surprising format, you see it now as an alternative of in manufacturing. You repair it within the dialog and run it once more.
You validate the logic, the integrations, and the enterprise guidelines , and solely transfer to code when the habits is precisely what you need.
Step 3: The code that comes out is already production-ready
While you transfer to code era, Agent Help doesn’t hand you a place to begin. It arms you a basis.
The agent you designed and simulated comes scaffolded with the whole lot it must run in manufacturing, together with OAuth authentication (no shared API keys), modular MCP server parts, deployment configuration, monitoring, and testing frameworks. Out of the field, Agent Help handles infrastructure that usually takes days to piece collectively.
The code is clear, documented, and follows commonplace patterns. You may take it and proceed constructing in your most popular atmosphere. However from the very first file, it’s one thing you might present to a safety workforce or hand off to ops with no disclaimer.
Step 4: Deploy from the identical terminal you in-built
If you find yourself able to ship, you keep in the identical workflow. Agent Help is aware of your atmosphere, the fashions accessible to you, and what a sound deployment requires. It validates the configuration earlier than touching something.
One command. Any atmosphere: on-prem, edge, cloud, or hybrid. Validated in opposition to your goal atmosphere’s safety and mannequin constraints. The identical agent that helped you design and simulate additionally is aware of the right way to ship it.
What groups are saying about Agent Help
“The toughest a part of AI agent improvement is requirement definition, particularly bridging the hole between technical groups and area consultants. Agent Help solves this interactively. A site person can enter a tough thought, and the device actively guides them to flesh out the lacking particulars. As a result of area consultants can instantly take a look at and validate the outputs themselves, Agent Help dramatically shortens the time from requirement scoping to precise agent implementation.”
The street forward for Agent Help
AI brokers have gotten core enterprise infrastructure, not experiments, and the tooling round them must catch up. The subsequent section of Agent Help goes deeper on the elements that matter most as soon as brokers are working in manufacturing: richer tracing and analysis so you possibly can perceive what your agent is definitely doing, native experimentation so you possibly can take a look at adjustments with out touching a stay atmosphere, and tighter integration with the broader ecosystem of instruments your brokers work with. The objective stays the identical: much less time debugging, extra time transport.
The exhausting half was by no means writing the code. It was the whole lot round it: realizing what to construct, validating it earlier than it touched manufacturing, and trusting that what shipped would preserve working. Agent Help is constructed round that actuality, and that’s the course it should preserve shifting in.
Get began with Agent Help in 3 steps
Able to ship your first manufacturing agent? Right here’s all you want:
1. Install the toolchain:
brew set up datarobot-oss/faucets/dr-cli uv pulumi/faucet/pulumi go-task node git python
Android Auto has been round for greater than a decade, and it’s an incredible resolution for staying related along with your cellphone safely whereas driving. Most automobiles assist a wired Android Auto connection, so be sure to have the greatest cables for the job. Wi-fi Android Auto is more and more widespread in lots of new automotive fashions, not simply premium ones. Plus, Google retains updating Auto with new options like assist for Google Meet in Android Auto.
Whereas it really works nice out of the field, there are a couple of settings I at all times tweak each time I exploit a brand new cellphone with my automotive. These are easy toggle switches that may make a world of distinction when utilizing Android Auto every day, and all of them might be completed through your cellphone itself. Listed below are my high 5 settings and tweaks that I like to recommend making when you’ve simply set it up.
One of the crucial annoying default settings of Android Auto is that your music will begin enjoying routinely as quickly as your cellphone connects to your automotive. It may be very jarring at night time when you had the amount turned all the way in which up the final time you used the automotive. It can be downright embarrassing, relying on who’s within the automotive with you, and your secret responsible pleasure music begins blaring. Maintain your dignity by merely disabling this toggle change.
Article continues beneath
In your cellphone, head to Settings, and seek for Android Auto. Within the settings menu, scroll all the way down to the Begin-up sub-menu and disable Begin music routinely. That’s it. Now the following time you join your cellphone, your music can be within the paused state.
By default, most messaging apps and a few media apps (YouTube Music, Spotify) put in in your cellphone will seem within the Android Auto app drawer. Any notifications from these apps may also pop up in your display screen as alerts. When driving, the very last thing you need is spam alerts from insurance coverage firms distracting you from the highway. There are two methods to disable notifications.
To attenuate the audible pings, head to the settings menu of Android Auto, scroll all the way down to the Messaging sub-menu, and easily disable Play message chime. It will not less than cease the audible alerts, however will nonetheless present you incoming messages. If you happen to want to cease this as effectively, you possibly can merely disable the primary sub-menu merchandise referred to as Present message notifications.
One other approach to do that is to disable any undesirable apps from the app drawer so these notifications are routinely eradicated. From the identical settings menu in your cellphone, scroll as much as Customise Launcher, and easily uncheck the apps you don’t want to use with Android Auto.
Get the newest information from Android Central, your trusted companion on the earth of Android
Forestall Android Auto from routinely beginning up
(Picture credit score: Android Central)
One of many nice issues about Android Auto is that it seamlessly begins up whenever you begin your automotive, however not everybody would possibly need it arrange this manner. Android Auto does drain your battery faster when related wirelessly to your automotive’s head unit, and also you may not at all times want it when you desire tuning in to the radio or do not want maps for shorter journeys.
Sadly, there is no devoted setting for this in Android Auto itself in the mean time. Not one of the selectable choices within the Begin Android Auto routinely menu means that you can do that. One workaround that has confirmed to work for me is to disable the toggle Begin Android Auto whereas locked.
In doing so, your cellphone will nonetheless hook up with your automotive through Bluetooth (for calls), however Android Auto will not begin, supplied your cellphone is locked. If you happen to do must allow Android Auto in the course of your drive, it is best to see a notification in your cellphone that claims “Android Auto is out there,” and tapping it ought to launch it. You possibly can additionally manually allow it through your automotive’s dashboard.
You may give voice instructions to Google Assistant to name a contact or discover locations alongside your route. If you happen to’re one of many fortunate ones, likelihood is that you have already got Gemini in Android Auto, which is much superior by way of understanding directions. Nevertheless, there can be instances when Assistant is just unable to know what you’re asking it to do resulting from wind noise when you’re driving with the window down or noisy youngsters within the again seat. In instances like these, it’s good to have shortcuts for steadily used actions already arrange.
In Android Auto settings in your cellphone, faucet Customise Launcher, then faucet Add a shortcut to the Launcher. You may set shortcuts for “Name a contact” or “An Assistant motion.“ The previous is self-explanatory because it mainly enables you to create a one-tap name shortcut for a contact. The latter is the place it will get fascinating, because it enables you to create any command that you just’d usually give Assistant, equivalent to “discover all gasoline stations close to me.” You may then label it and even check it out (Android Auto must be energetic for this) earlier than saving it.
When you create these shortcuts, you possibly can then return and reorder them to the highest of the record in order that they’re the primary icons you see whenever you open the app drawer in Android Auto. Now, your favourite contacts and most used actions are only a faucet away.
Below the identical Android Auto settings menu, allow the toggle referred to as Taskbar widgets. What this does is allow you to management music playback from the taskbar, even in case you have one other app like Maps open in full display screen. By default, the taskbar will present you different apps, however doing so means your passenger can’t pause or skip a observe in case you have one other app in full-screen mode.
These are some quite simple but vital tweaks you are able to do to Android Auto for a safer and extra pleasurable drive. Have an important tip that’s lacking right here? I’d love to listen to about it within the feedback.
Google’s best but
The Google Pixel 10 Professional is one of the simplest ways to expertise Android, the way in which Google supposed it. Plus, it will get two unique options for Android Auto referred to as Name Display screen and Name Notes, which have begun rolling out.
Researchers have taken a significant step towards understanding how black holes affect the universe by instantly measuring the ability of their jets. Utilizing a community of radio telescopes unfold throughout the globe, a group led by Curtin College captured detailed photos that reveal simply how energetic these jets might be. The findings assist long-standing theories concerning the function black holes play in shaping the construction of galaxies.
The examine, revealed in Nature Astronomy, centered on Cygnus X-1, a well known system that features the primary confirmed black gap and a large supergiant star. Scientists decided that the jets streaming from this black gap carry an power output equal to about 10,000 Suns.
To make this measurement, the group relied on a extensively spaced array of telescopes working collectively as one. This setup allowed them to observe how the jets had been pushed and distorted by highly effective winds coming from the close by star because the black gap traveled alongside its orbit. The impact is just like how robust gusts on Earth can bend a stream of water from a fountain.
Utilizing Stellar Winds to Reveal Jet Power
By calculating the energy of the star’s wind and monitoring how a lot the jets had been deflected, researchers had been in a position to decide the jets’ energy at a selected second. This marks the primary time scientists have instantly measured the instantaneous power of black gap jets fairly than counting on long-term averages.
The group additionally measured the jets’ velocity, discovering that they journey at roughly half the velocity of sunshine, or about 150,000 kilometers per second. Figuring out this velocity has been a problem for scientists for a few years.
The undertaking was led by the Curtin Institute of Radio Astronomy (CIRA) and the Curtin node of the Worldwide Centre for Radio Astronomy Analysis (ICRAR), with contributions from the College of Oxford.
“Dancing Jets” Supply New Perception
Lead creator Dr. Steve Prabu, who labored at CIRA in the course of the examine and is now on the College of Oxford, defined that the group used a sequence of photos to trace what he described as “dancing jets.” This time period refers back to the approach the jets shift route repeatedly as they’re pushed by the supergiant star’s robust winds whereas each objects orbit one another.
Dr. Prabu stated these observations reveal how a lot of the power generated close to a black gap is transferred into its environment, influencing the atmosphere round it.
“A key discovering from this analysis is that about 10 per cent of the power launched as matter falls in in direction of the black gap is carried away by the jets,” Dr. Prabu stated.
“That is what scientists often assume in large-scale simulated fashions of the Universe, however it has been exhausting to substantiate by remark till now.”
Confirming Theories About Black Gap Physics
Co-author Professor James Miller-Jones, from CIRA and the Curtin node of ICRAR, famous that earlier methods might solely estimate jet energy over extraordinarily lengthy durations, typically spanning 1000’s or thousands and thousands of years. This made it troublesome to instantly evaluate jet power with the X-ray emissions produced as matter falls right into a black gap.
“And since our theories counsel that the physics round black holes could be very comparable, we will now use this measurement to anchor our understanding of jets, whether or not they’re from black holes 10 or 10 million instances the mass of the Solar,” Professor Miller-Jones stated.
“With radio telescope tasks such because the Sq. Kilometre Array Observatory presently below development in Western Australia and South Africa, we anticipate to detect jets from black holes in thousands and thousands of distant galaxies, and the anchor level offered by this new measurement will assist calibrate their total energy output.
“Black gap jets present an necessary supply of suggestions to the encircling atmosphere and are vital to understanding the evolution of galaxies.”
Different collaborators on the analysis included the College of Barcelona, the College of Wisconsin-Madison, the College of Lethbridge and the Institute of House Science.
Because the Utility Community service is in preview, begin by registering it in your account. This could take a while, however as soon as it’s registered you’ll be able to set up the AppNet CLI extension that’s used to handle and management Utility Community to your AKS clusters. Now you can begin to arrange the ambient service mesh, both creating new clusters to make use of it, or including the service mesh to current AKS deployments.
Ranging from scratch is the best method, because it ensures that you simply’re working in the identical tenant. AKS clusters and Utility Community will be in the identical useful resource group if you need, nevertheless it’s not crucial. You’re free to make use of separate useful resource teams for administration.
The appnet command makes it simple to create an Utility Community from the command line; all you want is a reputation for the community, a useful resource group, a location, and an identification sort. When you’ve run the command to create your ambient mesh, look forward to the mesh to be provisioned earlier than becoming a member of a cluster to your community. This once more merely wants a useful resource group, a reputation for the member cluster, and its useful resource group and cluster identify. On the similar time, you outline how the community can be managed, i.e. whether or not you handle upgrades your self or go away Azure to handle them for you. Extra clusters will be added to the community the identical method.
A couple of months in the past, I reviewed my AI subscriptions and easily requested: “Am I really utilizing this?” The reply, in three out of 4 instances, was not likely. I used to be paying for Adobe Firefly, ChatGPT Plus, and Perplexity Professional. Every served a unique function on paper, however in actuality, I used them equally and paid 3 times for a similar comfort. I canceled all three, switched to free options, and saved about $50/month within the course of
To make sense of my selections and their penalties, let me lead you thru what I reduce, what I changed it with, and my sincere tackle the tradeoffs.
What number of AI subscriptions do you pay for?
186 votes
Adobe Firefly → Ideogram
I subscribed to Adobe Firefly due to one particular promoting level: it’s skilled on licensed content material, which, in concept, makes it safer to make use of commercially. For anybody producing content material professionally, that issues, or a minimum of it sounds prefer it ought to.
The truth is that I used to be producing AI photos sometimes, not continually. And Firefly’s output, whereas clear, hardly ever blew me away. I used to be paying for a security assure I didn’t usually want and picture high quality that free instruments have largely caught as much as.
I switched to Ideogram for header photos, social graphics, and occasional illustrations for my journey web site. In my case, I take advantage of the occasional photos for 48-hour metropolis guides and social graphics for the location’s Instagram account.
The free tier offers you loads of generations, and the standard of photorealistic and stylized prompts is spectacular. I haven’t as soon as wished I have been again on Firefly.
What I miss about Firely: The industrial licensing peace of thoughts, if I’m being sincere. For those who’re producing work the place IP possession is an actual concern, Firefly’s coaching information argument nonetheless holds. For many informal customers, although, it’s laborious to justify the associated fee.
Verdict: I canceled Firefly as a result of I didn’t want its particular benefits and located that free options have been adequate for my wants. No regrets right here.
ChatGPT Plus → free ChatGPT (with a caveat)
Bryan Wolfe / Android Authority
This one is trickier to speak about as a result of I didn’t simply change to free ChatGPT; I already had Claude Professional, which prices about the identical as ChatGPT Plus. As such, I didn’t actually save $20; I redirected it. However the cancellation was nonetheless price it.
I stored ChatGPT Plus largely out of behavior. Most of my use was for fast queries that free ChatGPT might deal with. The actual concern wasn’t its functionality, however that I used to be utilizing it routinely.
As a contract tech author, the audit was about figuring out which instruments really added skilled worth.
For those who’re an informal ChatGPT consumer, the free tier covers the overwhelming majority of on a regular basis duties. Summarizing, drafting, answering questions, serving to you suppose by means of issues — it’s all there. GPT-4o entry on the free tier is rate-limited, however until you’re often hitting these limits, you in all probability received’t discover.
What I miss about ChatGPT Plus: Limitless entry to GPT-4o. On heavy-use days, the speed limits on the free tier are actual and sometimes irritating. For those who’re an influence consumer who leans on ChatGPT continually all through the day, Plus should be price it.
Verdict: I canceled ChatGPT Plus as a result of it overlapped with Claude Professional, and the free model met my day-to-day wants. This made the choice simpler.
Perplexity Professional → free Perplexity
Bryan Wolfe / Android Authority
This is perhaps probably the most easy of the three cancellations. I subscribed to Perplexity Professional for its AI-powered search and extra options, however the easy reality is that I didn’t use them.
I largely used Perplexity for fast analysis, the place I wished synthesized solutions with verifiable hyperlinks. The free tier did this simply as effectively; I hardly ever hit its limits, and mannequin variations weren’t important for my wants.
The Professional upsell makes extra sense in the event you’re doing heavy, sustained analysis and wish entry to the expanded mannequin. For normal use, the free model is without doubt one of the higher free instruments within the AI house, full cease.
What I miss about Perplexity Professional: Nothing, genuinely. That is the cleanest cancellation of the three.
Verdict: I canceled Perplexity Professional as a result of the free tier supplied all the things I wanted. No options have been missed, and there have been no drawbacks.
The paid AI subscription I stored
Bryan Wolfe / Android Authority
Having mentioned all of that, I nonetheless pay for one AI subscription: Claude Professional.
To be clear, this isn’t a criticism of the instruments above — they every work effectively. Nevertheless, of all my subscriptions, Claude Professional was the one one performing common, particular duties I couldn’t get without spending a dime elsewhere. Like my colleague Andrew Grush not too long ago found, I discovered that shifting from ChatGPT to Claude Professional was good for me.
I take advantage of Claude Professional for journalism, B2B consumer work, coding for my web site, and writing a novel, which requires managing complexity over lengthy periods. For journalism, consumer work, and a novel in progress, Claude Professional was indispensable in methods the opposite providers above weren’t.
The best subscription is completely different for each particular person. Your audit may land someplace fully completely different. The purpose isn’t which instrument wins — it’s doing the audit within the first place.
What this train really taught me
Calvin Wankhede / Android Authority
The actual lesson wasn’t about AI instruments, however the hole between my notion and actuality.
I subscribed to Firefly for industrial licensing, stored ChatGPT Plus out of behavior, and tried Perplexity Professional for its interesting options. None of these are nice causes to maintain spending cash.
For those who haven’t checked out your AI subscriptions recently, open your bank card assertion and ask your self the identical query I did: Am I really utilizing this? Not “might I take advantage of this” or “do I like having this” — am I utilizing it sufficient to justify the associated fee?
You is perhaps shocked by the reply.
Don’t wish to miss one of the best from Android Authority?
Thanks for being a part of our neighborhood. Learn our Remark Coverage earlier than posting.
An experimental remedy has doubled one-year survival charges for pancreatic most cancers, one of many deadliest forms of most cancers, a brand new research stories.
The drug, known as elraglusib, targets the protecting net that pancreatic tumors construct round themselves, thus serving to immune molecules and chemotherapy higher penetrate the tumors. The outcomes of the trial displaying elraglusib’s security and efficacy have been revealed April 14 within the journal Nature Drugs.
“We’re beginning to see, for the primary time, medication which might be efficient outdoors of the usual chemotherapy that we have now used for pancreatic most cancers for the final twenty years,” mentioned research co-author Dr. Devalingam Mahalingam, an oncologist at Northwestern College Feinberg Faculty of Drugs.
A uncommon win in pancreatic most cancers remedy
Pancreatic most cancers has one of many poorest prognoses of all cancers, with sufferers who’re newly identified having solely a 13% probability of surviving 5 years with the illness. Usually, the issue is that pancreatic most cancers just isn’t detected till it has progressed considerably,
“Most sufferers, sadly, current with superior illness,” Mahalingam instructed Stay Science. “There are not any screening instruments to choose issues up earlier.”
As well as, the area surrounding the tumor, known as the tumor microenvironment, poses issues for pancreatic most cancers remedy. “It is very dense and fibrous,” he mentioned, which reduces the effectiveness of typical remedies for the situation, like chemotherapy.
Elraglusib addresses this drawback by suppressing a protein known as glycogen synthase kinase-3 beta (GSK-3 beta).
Get the world’s most fascinating discoveries delivered straight to your inbox.
Petri-dish research had beforehand proven that GSK-3 beta helps maintain pancreatic most cancers cells alive by boosting the exercise of a protein known as nuclear issue κB, which helps pancreatic cells resist programmed cell loss of life — basically a mobile “self destruct” button. The drug additionally suppresses molecules that make the tumors proof against the immune system.
Elraglusib boosts survival time
Previous work confirmed that elraglusib was protected for sufferers with a variety of cancers, however to see whether or not it improved pancreatic most cancers outcomes, Mahalingam and his colleagues examined the drug in 286 individuals who had been lately identified with pancreatic most cancers. The sufferers obtained chemotherapy with or with out elraglusib. Practically all the sufferers within the trial had superior, metastatic illness, which means the most cancers had unfold to different elements of the physique past the pancreas.
Half of the sufferers given elraglusib and chemotherapy have been nonetheless alive after 10.1 months, whereas half of the sufferers given chemotherapy alone have been nonetheless alive after 7.2 months. Of the sufferers given elraglusib, 42% lived a 12 months after their prognosis, in contrast with 22% of those that obtained solely chemotherapy.
It is by no means straightforward to develop a drug from a tutorial establishment. It is good to see some that come true.
Dr. Devalingam Mahalingam, oncologist at Northwestern College Feinberg Faculty of Drugs
Though elraglusib elevated total survival time, it did not lengthen the period of time that sufferers lived with out their most cancers rising or spreading to new areas.
The trial’s protocol required sufferers to cease receiving remedy if their illness progressed, and Mahalingam mentioned the trial’s extraordinarily sick cohort meant the possibilities of development have been excessive. Consequently, some sufferers have been switched to palliative care earlier than the drug’s results turned apparent. These sufferers might have lived longer if that they had stayed on the trial and obtained extra doses of the drug, Mahalingam speculated.
Future remedy choices
In lab-dish experiments and animal testing, elraglusib additionally made the atmosphere surrounding the tumor extra permeable to immune cells and chemotherapy, and it decreased tumor cells’ potential to battle off immune cells as soon as they infiltrated the tumor.
These talents, mixed with the drug’s security, may make it a helpful complement to different pancreatic most cancers therapies, equivalent to immune checkpoint inhibitors, which improve the immune system’s potential to acknowledge and kill tumor cells, or KRAS inhibitors, which cease mutant proteins that drive tumor development, Mahalingam mentioned.
Elraglusib may probably deal with different forms of most cancers together with chemotherapy, Mahalingam famous. A decade in the past, different GSK-3 beta-targeting medication have been examined in opposition to different strong tumor cancers however by no means made it previous the early phases of medical testing. However therapeutic doses of these medication did not attain the tumors — an impediment elraglusib has overcome, Mahalingam mentioned.
The brand new research can also be notable as a result of the drug was developed with out the involvement of enormous pharmaceutical firms.
“It is by no means straightforward to develop a drug from a tutorial establishment,” he added. “It is good to see some that come true.”
This text is for informational functions solely and isn’t meant to supply medical recommendation.
Mahalingam, D., Shroff, R. T., Carneiro, B. A., Ji, Y., Coveler, A. L., Cervantes, A., Sahai, V., Ploquin, A., Hiret, S., LoConte, N. Okay., %, I. J., Lopez, C. D., Pernot, S., Kavan, P., Mulcahy, M., Carr, R., Giles, F. J., Seifarth, C., Ugolkov, A., . . . Bekaii-Saab, T. S. (2026). Elraglusib and chemotherapy in metastatic pancreatic ductal adenocarcinoma: a randomized managed part 2 trial. Nature Drugs. https://doi.org/10.1038/s41591-026-04327-4
Between now and the tip of the 12 months, the annual Stata Convention in the USA will happen together with 5 different Stata conferences in nations around the globe.
Stata conferences and conferences characteristic talks by each Stata customers and Stata builders and supply a chance to assist form the way forward for Stata improvement by interacting with and offering suggestions on to StataCorp personnel.
The talks vary from longer shows by invited audio system to shorter talks demonstrating using Stata in quite a lot of fields. Some talks are statistical in nature whereas others give attention to information administration, graphics, or programming in Stata. New enhancements to Stata created each by customers and by StataCorp are sometimes featured in talks.
Apple is presenting new analysis on the annual Worldwide Convention on Studying Representations (ICLR), which takes place in particular person in Rio de Janeiro, Brazil, from April 23 to 27. We’re proud to once more sponsor the convention, which brings collectively the scientific and industrial analysis communities targeted on deep studying. Beneath is an outline of Apple’s participation at ICLR 2026:
Bounce to a bit:
Cease by the Apple sales space #204 throughout exhibition hours: 9:30 AM – 5:30 PM (Thursday, April 23 – Saturday, April 25). All occasions referenced in schedule are in BRT (native time).
Ram Ramrakhya (Georgia Institute of Expertise), Harsh Agrawal, Anh Nguyen, Omar Attia, Bogdan Mazoure, Yuhao Yang, Zhe Gan, Andrew Szot, Alexander Toshev
Yixing Lao (The College of Hong Kong), Xuyang Bai, Xiaoyang Wu, Arnold Yan, Zixin Luo, Tian Fang, Danny Nahmias, Yanghai Tsin, Shiwei Li, Hengshuang Zhao (The College of Hong Kong)
POSTER
Study to Purpose Effectively with Adaptive Size-based Reward Shaping
Wei Liu (HKUST), Ruochen Zhou (HKUST), Yiyun Deng (HKUST), Yuzhen Huang (HKUST), Jaunting Liu (HLUST), Yuntian Deng (College of Waterloo), Yizhe Zhang, Junxian He (HKUST)
Shruti Palaskar, Leon Gatys, Mona Abdelrahman, Mar Jacobo, Larry Lindsey, Rutika Moharir, Gunnar Lund, Yang Xu, Navid Shiee, Jeffrey Bigham, Charlie Maalouf, Joseph Yitan Cheng
POSTER
LaDiR: Latent Diffusion Enhances LLMs for Textual content Reasoning
Pingzhi Li (College of North Carolina, Chapel Hill), Bairu Hou (College of California, Santa Barbara), Yun Zhu, Yihao Feng, Ke Ye, Tao Lei, Zhifeng Chen, Tianlong Chen (College of North Carolina, Chapel Hill), Xianzhi Du
Deepro Choudhury (College of Oxford), Sinead Williamson, Michael Kirchhof, Adam Golinski, Yizhe Zhang, Ning Miao (Metropolis College of Hong Kong), Freddie Bickford Smith (College of Oxford), Tom Rainforth (College of Oxford)
Amir Joudaki (ETH Zurich), Giulia Lanzillotta (ETH Zurich), Mohammad Samragh Razlighi, Iman Mirzadeh, Keivan Alizadeh Vahid, Thomas Hofmann (ETH Zurich), Mehrdad Farajtabar, Fartash Faghri
Sunday, April 26
WORKSHOP
Narrative of Time Throughout Scales (NoTS)
9:00 AM – 5:00 PM, Time Collection within the Age of Massive Fashions Workshop 2026, 205
Wenrui Ma (College of Pennsylvania), Ran Liu, Ellen Zippi, Chris Sandino, Juri Minxha, Behrooz Mahasseni, Erdrin Azemi, Ali Moin, Eva Dyer (College of Pennsylvania)
Monday, April 27
WORKSHOP
Buying and selling Depth for Reminiscence: Robustifying LLMs towards Cache Constraints
9:00 AM – 5:00 PM, third Workshop on Check-Time Updates (TTU): Placing Updates to the Check! 2026, 206
Joao Monteiro, Anastasiia Filippova, David Grangier, Marco Cuturi
WORKSHOP
LaCy: What Small Language Fashions Can and Ought to Study is Not Only a Query of Loss
9:00 AM – 5:00 PM, Workshop on Reminiscence for LLM-Based mostly Agentic Techniques 2026, 205
Szilvia Reka Ujvary, Michael Kirchhof, Louis Béthune, Marco Cuturi, Pierre Ablin, Joao Monteiro
This demo will showcase on-device LLM inference on a MacBook Professional with M5 Max utilizing MLX, Apple’s open-source array framework purpose-built for Apple silicon, working a quantized frontier coding mannequin completely regionally inside Xcode’s native improvement setting. The complete stack — MLX, mlx-lm, and mannequin weights — is open supply, inviting the analysis neighborhood to construct on and lengthen these strategies independently.
This demo reveals SHARP working on a set of pre-recorded photos or photos captured instantly by the consumer through the demo. Guests will expertise the quick course of from deciding on a picture, processing it with SHARP, and viewing the generated 3D Gaussian level cloud on an iPad Professional with the M5 chip.
Each MLX and SHARP demos might be out there on the Apple Sales space throughout exhibition hours.
Carl Vondrick is the ICLR 2026 Common Chair.
Alexander Toshev and Vladlen Koltun are Senior Space Chairs.
Carl Vondrick, Eugene Ndiaye, Fartash Faghri, Jiatao Gu, Joao Monteiro, Miguel Angel Bautista, Philipp Krähenbühl, Pierre Ablin, Shuangfei Zhai, and Yizhe Zhang, and Zhe Gan are Space Chairs.
AuthorsRyan Hoque*, Peide Huang*, David J. Yoon*, Mouli Sivapurapu, Jian Zhang
AuthorsWenhui Cui†**, Christopher M. Sandino, Hadi Pouransar, Ran Liu, Juri Minxha, Ellen L. Zippi, Erdrin Azemi, Behrooz Mahasseni
AuthorsAleksei Petrenko‡, Ben Lipkin†‡**, Kevin Chen, Erik Wijmans, Marco Cusumano-Towner, Raja Giryes, Philipp Krähenbühl
AuthorsStephen Zhang**, Seyed Alireza Mousavi Hosseini**, Michal Klein, Marco Cuturi
AuthorsAmin Karimi Monsefi†‡, Nikhil Bhendawade, Manuel R. Ciosici, Dominic Culver, Yizhe Zhang, Irina Belousova
AuthorsEmily Cheng†, Carmen Amo Alonso‡, Federico Danieli, Arno Blaas, Luca Zappella, Pau Rodríguez, Xavier Suau
AuthorsSilvia Sapora**, Devon Hjelm, Alexander Toshev, Omar Attia, Bogdan Mazoure
AuthorsSumanth Varambally**†, Thomas Voice, Yanchao Solar, Zhifeng Chen, Rose Yu†, Ke Ye
LaDiR: Latent Diffusion Enhances LLMs for Textual content Reasoning
Murray Kang (UCSD), Yizhe Zhang, Nikki Kuang (UCSD), Nicklas Majamaki (UCSD), Navdeep Jaitly, Yian Ma (UCSD), Lianhui Qin (UCSD)
Study to Purpose Effectively with Adaptive Size-based Reward Shaping
Wei Liu (HKUST), Ruochen Zhou (HKUST), Yiyun Deng (HKUST), Yuzhen Huang (HKUST), Jaunting Liu (HLUST), Yuntian Deng (College of Waterloo), Yizhe Zhang, Junxian He (HKUST)
AuthorsShenao Zhang†**, Donghan Yu, Yihao Feng, Bowen Jin‡**, Zhaoran Wang†, John Peebles**, Zirui Wang
AuthorsHsuan Su†, Ting-Yao Hu, Hema Swetha Koppula, Kundan Krishna, Hadi Pouransari, Cheng-Yu Hsieh, Cem Koc, Joseph Yitan Cheng, Oncel Tuzel, Raviteja Vemulapalli
AuthorsPreetum Nakkiran, Arwen Bradley, Adam Goliński, Eugene Ndiaye, Michael Kirchhof, Sinead Williamson
AuthorsShruti Palaskar, Leon Gatys, Mona Abdelrahman, Mar Jacobo, Larry Lindsey, Rutika Moharir, Gunnar Lund, Yang Xu, Navid Shiee, Jeffrey Bigham, Charles Maalouf, Joseph Yitan Cheng
AuthorsJiayuan Ye, Vitaly Feldman, Kunal Talwar
AuthorsKundan Krishna, Joseph Y Cheng, Charles Maalouf, Leon A Gatys
AuthorsSzilvia Ujváry†**, Louis Béthune, Pierre Ablin, João Monteiro, Marco Cuturi, Michael Kirchhof
AuthorsBingbing Wen**, Sirajul Salekin, Feiyang Kang†, Lucy Lu Wang‡, Invoice Howe‡, Javier Movellan, Manjot Bilkhu
Narrative of Time Throughout Scales (NoTS)
Wenrui Ma (College of Pennsylvania), Ran Liu, Ellen Zippi, Chris Sandino, Juri Minxha, Behrooz Mahasseni, Erdrin Azemi, Ali Moin, Eva Dyer (College of Pennsylvania)
AuthorsSkyler Seto, Pierre Ablin, Anastasiia Filippova, Jiayuan Ye†, Louis Béthune, Angelos Katharopoulos, David Grangier
AuthorsAlec Helbling†**, Shruti Palaskar, Kundan Krishna, Polo Chau†, Leon Gatys‡, Joseph Yitan Cheng‡
AuthorsLorenzo Noci**, Gregor Bachmann, Seyed-Mohsen Moosavi-Dezfooli, Moin Nabi
Buying and selling Depth for Reminiscence: Robustifying LLMs towards Cache Constraints
Joao Monteiro, Anastasiia Filippova, David Grangier, Marco Cuturi