Saturday, April 18, 2026
Home Blog

Weekend tabs – by scott cunningham

0


On this publish I handle to get my open tabs all the way down to 1 — The one remaining one being this bedframe that I nonetheless can’t fairly deliver myself to shut. You’ll discover articles about economics, AI and a bunch of images of a celebration I’m throwing subsequent week. Take into account turning into a paying subscriber for normal updates on Claude code, causal inference, and random issues!

The World Financial institution embraces industrial coverage over free market capitalism.

Paul Goldsmith-Pinkham tracks the credibility revolution throughout fields.

Alex Albright shares her ideas about sensible recommendation for younger economists (applies to any adjoining sort of researcher too) on issues like analysis, work-life steadiness, networking.

A birthday convention for Andy Gelman who turns 60. Salute!

A really good-looking woven MacBook case.

A rundown of Haotong Li’s thirteenth gap at Augusta.

Mark Zuckerberg has discovered a use for his deserted Meta-verse and the myriad avatars. Now he sends a clone of himself to workplace conferences in his place.

SBNation makes its picks for main NBA awards. Luca missed the 65 sport rule essential to get any when his season ended with an damage, however nonetheless, I’ll discover a strategy to get excited I’m positive.

Eight issues that emotionally safe {couples} say to 1 one other. They’re 1) something particular at present? 2) how can I present my appreciation for you? 3) would you like my opinion or would you like me to only pay attention? 4) How can I assist? 5) can we make a while to speak? 6) how do you assume we’re doing currently? 7) what can be enjoyable for us to do at present? 8) what bigger themes are you working by means of?

Laurent Bergé webpage. Right here’s part of his at JEBO on triadic closure in networks and collaboration between French inventors.

Right here’s Bergé working paper with Grant McDermott and Kyle Butts on the wonderful fixest package deal in R.

Two-Michelin-Starred Chef François-Emmanuel Nicol shall be internet hosting a particular dinner at Rhode Island’s Ocean Home on Friday and Saturday this week.

Ought to crushes humiliate you? This creator may say sure however I couldn’t get behind the paywall. Perhaps you’ll have extra luck.

One of many first causalities of the age of huge language mannequin is right here: newspapers and magazines are extra aggressively blocking the wayback machine from archiving. I’m assuming it’s LLMs anyway.

Gemini and NotebookLM continues to develop. Now they get notebooks. Right here’s one testimony of what it has changed for an individual. It’s apparently a useful improve.

Cubans self medicating below financial sanctions.

One of many issues about on-line relationship period that I believe is telling is what number of issues that appear like traditionally regular patterns of relationship conduct turn out to be introduced as steady types of short-term relationships, or what my coauthors and I name “informal versus romantic”. The informal class is a continuing evolving dictionary of clearly outlined relationship ideas describing what’s principally simply variations on a one-sided type of short-term relationships, whereas the “romance” aspect is principally simply two issues: marriage versus cohabiting, with each being simply social expressions of 1 factor and that’s life companion. Right here’s one other article about what number of you’re only a short-term placeholder in your boo as they wait for somebody higher. You simply get the sense that the net relationship period is a wealthy and subtly distinct set of relationship classes, however moreso for the short-term matching than the extra semi-permanent ones, which to me is telling as that means one in every of them is the much more related equilibrium for these platforms (informal and short-term) than the opposite.

Right here — construct your personal BMW. I however will most likely simply change my 1991 Volvo 240 5 velocity turbo brick with one other one in every of she croak. Right here she is, ready for me after I get house later this summer time.

Triple/double debiased lasso.

Talking of, this basic letter from Gillette says screw it, they’re going with 5 blades.

Refine.ink begins to get extra competitors, exhibiting me that downstream from the big AI oligopolies, the distributors who use LLMs are most likely dealing with few obstacles to entry. Actually, the recognition of vibe coding with brokers is principally an indication of that as a result of not even programming ability is required. Right here’s a assessment of this new one.

There’s one other by a Michigan Econ scholar too known as coarse.ink that’s borderline free and apparently additionally practically pretty much as good if not pretty much as good or higher than refine. Let a thousand instruments to assist refine our papers bloom!

Can forgiveness scientifically enhance our well-being? Harvard scientist says sure. Don’t begin tomorrow although — begin proper now.

For those who’re my age, Gen X, born circa mid-Nineteen Seventies, then you definately perhaps noticed Faces of Demise as a child. Which is astonishing in that case — that our mother and father seemed the opposite method whereas we checked out what felt like a snuff movie on the native VHS retailer, however I’m fairly positive at the least one in every of you, like me, noticed it at far too younger of an age for one thing so stunning. We didn’t have the web, so when the quilt mentioned it was movies of actual deaths, banned in dozens of nations, I believed it. And I nonetheless keep in mind to today earlier than it ended, perhaps 10-11yo, turning it off earlier than it ended, and working outdoors to play due to how completely terrified I used to be from a scene. That movie-documentary single-handedly concluded my curiosity in horror, which by no means to today got here again. And I nonetheless keep in mind in school, perhaps ten years later, randomly stumbling upon an article whereas within the UTK library that mentioned it was all pretend, and I used to be surprised. I’d believed for a decade I had personally witnessed many, many individuals being by some means murdered on digital camera solely to then be taught it was fully pretend. I used to be relieved and stunned on the similar time. Right here’s a narrative about how they pulled such a factor off.

Talking of scary, Anthropic’s new LLM mannequin — Mythos — continues to impress and frighten.

And when you didn’t see it, Ben Affleck bought his AI startup for a cool $600 million to Netflix.

Broad Nosh Bagels in NYC

Boston’s marathon is Monday, and the end line is true down the road from me. Right here’s photos I took yesterday. It’s going to be a celebration. And I proceed to try to psychologically delay the unhappiness that at some point I’ll now not reside right here.

I’m throwing a celebration for a pal and their buddies on Monday and selected to share some texas with them by ordering 7 kilos of bbq brisket, 4 kilos of bbq hen, from Texas Month-to-month’s beloved Waco star bbq institution, Helberg. My cargo received right here yesterday, and right here is me unboxing it excitedly. I made a decision to provide my new pal and their buddies a celebration sharing my beloved texas with them with brisket tacos, guac, and queso, in addition to margaritas, and mock-aritos.

And listed here are among the issues I want for my margarita recipe.

After which right here’s is the calm earlier than the storm for the tacos and chips are laid out, together with Spanish themes runners that I unfold out everywhere in the room.

I’ll miss this lovely metropolis, its lovely inhabitants, and Harvard. What a life altering yr it’s been, and a present.

Talking of beloved buddies, Andrew Baker at UC Berkeley has a brand new working paper on state anti takeover provisions. He advised me that is what was initially what impressed his famed diff-in-diff paper. There’s few folks whose utilized analysis chops impresses me greater than Baker to be sincere.

I really feel like I posted this earlier than, but when so, right here it’s. Acemoglu and others have a brand new working paper on AI and its impact on data, and it’s not the rosiest conclusion.

Sam Peltzman writes a few happiness crash following Covid occasions in 2020.

Extra funds minimize proposals for federal grants by the Trump administration, this time extra pointedly aimed on the social sciences.

Lynne Kliesing writes about AI from a value theorist’s perspective.

AI, Value Principle, and the Way forward for Economics Analysis

Supply: ChatGPT upon studying this text…

Learn extra

16 days in the past · 59 likes · 17 feedback · Lynne Kiesling

Trump’s budgets requests create uncertainty.

How will the info facilities for AI buildout be financed? A new working paper. Markus Academy breaks it down.

Knowledge Facilities: Financing the AI Buildout

For the newest episode of Markus’ Academy Stijn Van Nieuwerburgh introduced his current paper: Financing the AI Buildout. Van Nieuwerburgh is the Earle W. Kazis and Benjamin Schore Professor of Actual Property and Professor of Finance at Columbia College’s Graduate Faculty of Enterprise…

Learn extra

a month in the past · 32 likes · 1 remark · Markus’ Academy

Talking of Markus Academy, Paul GP mentioned internet scraping on one in every of his current videos-posts,

Basil Helperin on college and teenage suicide, and particularly a graph by a bunch of economists. The sample seems to carry up.

A staff collected info from forecasts from specialists and share what they give thought to the close to and extra distant future.

This can be a couple weeks’ outdated by now, however Harvard eyes a $675 million bond sale as their monetary pressures develop.

Followers apparently love the new James Bond present on Amazon prime.

Contained in the race to promote OnlyFans as its owner-founder started to succeed in the top of his life attributable to his battle with most cancers.

When one a part of the romantic couple loves Claude code and the opposite may give a crap.

The statistician at Yale, Melody Huang, introduced her work at Harvards utilized stats seminar not too way back. Right here’s work she’s finished on AI assisted resolution making.

One other current seminar lately right here at Harvard, this time by a authorized scholar on psychedelics.

Emily Sweeney labored on the Boston globe for a very long time, and is mesmerizing folks on TikTok and Twitter together with her Boston accent. Right here is the New York Occasions article on it too.

Tremendous agers, in accordance with Harvard researchers, preserve their brains younger.

And with that I bid adieu. Wishing all of the marathoners right here on the town studying this my greatest. Perhaps I’ll see you cross the end line. Legends!

Your RAG System Retrieves the Proper Knowledge — However Nonetheless Produces Unsuitable Solutions. Right here’s Why (and Tips on how to Repair It).

0


The Precisely as Designed. The Reply Was Nonetheless Unsuitable.

I need to inform you in regards to the second I ended trusting retrieval scores.

I used to be operating a question in opposition to a data base I had constructed fastidiously. Good chunking. Hybrid search. Reranking. The highest-k paperwork got here again with cosine similarities as excessive as 0.86. Each indicator mentioned the pipeline was working. I handed these paperwork to a QA mannequin, received a assured reply, and moved on.

The reply was flawed.

Not hallucinated-wrong. Not retrieval-failed-wrong. The precise paperwork had come again. Each of them. A preliminary earnings determine and the audited revision that outmoded it, sitting aspect by aspect in the identical context window. The mannequin learn each, selected one, and reported it with 80% confidence. It had no mechanism to inform me it had been requested to referee a dispute it was by no means designed to guage.

That’s the failure mode this text is about. It doesn’t present up in your retrieval metrics. It doesn’t set off your hallucination detectors. It lives within the hole between context meeting and technology — the one step within the RAG pipeline that just about no person evaluates.

I constructed a reproducible experiment to isolate it. Every part on this article runs on a CPU in about 220 MB. No API key. No cloud. No GPU. The output you see within the terminal screenshots is unmodified.

Full Supply Code: https://github.com/Emmimal/rag-conflict-demo


What the Experiment Exams

The setup is intentionally scientific. Three questions. One data base containing three conflicting doc pairs that make immediately contradictory claims about the identical reality. Retrieval is tuned to return each conflicting paperwork each time.

The query is just not whether or not retrieval works. It does. The query is: what does the mannequin do once you hand it a contradictory transient and ask it to reply with confidence?

The reply, as you will note, is that it picks a aspect. Silently. Confidently. With out telling you it had a option to make.

RAG techniques can retrieve the suitable paperwork however nonetheless produce incorrect solutions on account of hidden conflicts throughout context meeting. Picture by Creator.

Three Situations, Every Drawn from Manufacturing

State of affairs A — The restatement no person informed the mannequin about

An organization’s This autumn earnings launch reviews annual income of $4.2M for fiscal yr 2023. Three months later, exterior auditors restate that determine to $6.8M. Each paperwork dwell within the data base. Each are listed. When somebody asks “What was Acme Corp’s income for fiscal yr 2023?” — each come again, with similarity scores of 0.863 and 0.820 respectively.

The mannequin solutions $4.2M.

It selected the preliminary determine over the audited revision as a result of the preliminary doc scored marginally greater in retrieval. Nothing in regards to the reply indicators {that a} extra authoritative supply disagreed.

State of affairs B — The coverage replace that arrived too late

A June 2023 HR coverage mandates three days per week in-office. A November 2023 revision explicitly reverses it — totally distant is now permitted. Each paperwork are retrieved (similarity scores 0.806 and 0.776) when an worker asks in regards to the present distant work coverage.

The mannequin solutions with the June coverage. The stricter, older rule. The one which now not applies.

State of affairs C — The API docs that by no means received deprecated

Model 1.2 of an API reference states a charge restrict of 100 requests per minute. Model 2.0, printed after an infrastructure improve, raises it to 500. Each are retrieved (scores 0.788 and 0.732).

The mannequin solutions 100. A developer utilizing this reply to configure their charge limiter will throttle themselves to one-fifth of their precise allowance.

None of those are edge circumstances. Each manufacturing data base accumulates precisely these patterns over time: monetary restatements, coverage revisions, versioned documentation. The pipeline has no layer that detects or handles them.


Working the Experiment

pip set up -r necessities.txt
python rag_conflict_demo.py

necessities.txt

sentence-transformers>=2.7.0   # all-MiniLM-L6-v2  (~90 MB)
transformers>=4.40.0           # deepset/minilm-uncased-squad2 (~130 MB)
torch>=2.0.0                   # CPU-only is okay
numpy>=1.24.0
colorama>=0.4.6

Two fashions. One for embeddings, one for extractive QA. Each obtain robotically on first run and cache domestically. Complete: ~220 MB. No authentication required.


Part 1: What Naive RAG Does

Right here is the unmodified terminal output from Part 1 — normal RAG with no battle dealing with:

────────────────────────────────────────────────────────────────────
  NAIVE  |  State of affairs A — Numerical Battle
────────────────────────────────────────────────────────────────────
  Question       : What was Acme Corp's annual income for fiscal yr 2023?
  Reply      : $4.2M
  Confidence  : 80.3%
  Battle    : YES — see warning

  Sources retrieved
    [0.863] This autumn-2023-Earnings-Launch            (2024-01-15)
    [0.820] 2023-Annual-Report-Revised          (2024-04-03)
    [0.589] Firm-Overview-2024               (2024-01-01)

  Battle pairs
    fin-001  ↔  fin-002
    numerical contradiction  (topic_sim=0.83)
    [Q4-2023-Earnings-Release: {'$4.2M'}]  vs  [2023-Annual-Report-Revised: {'$6.8M'}]
────────────────────────────────────────────────────────────────────

────────────────────────────────────────────────────────────────────
  NAIVE  |  State of affairs B — Coverage Battle
────────────────────────────────────────────────────────────────────
  Question       : What's the present distant work coverage for workers?
  Reply      : all staff are required to be current within the workplace
                a minimal of three days per week
  Confidence  : 78.3%
  Battle    : YES — see warning

  Sources retrieved
    [0.806] HR-Coverage-June-2023                 (2023-06-01)
    [0.776] HR-Coverage-November-2023             (2023-11-15)
    [0.196] HR-Coverage-November-2023             (2023-11-15)
────────────────────────────────────────────────────────────────────

────────────────────────────────────────────────────────────────────
  NAIVE  |  State of affairs C — Technical Battle
────────────────────────────────────────────────────────────────────
  Question       : What's the API charge restrict for the usual tier?
  Reply      : 100 requests per minute
  Confidence  : 81.0%
  Battle    : YES — see warning

  Sources retrieved
    [0.788] API-Reference-v1.2                  (2023-02-10)
    [0.732] API-Reference-v2.0                  (2023-09-20)
    [0.383] API-Reference-v2.0                  (2023-09-20)
────────────────────────────────────────────────────────────────────
A dark-themed terminal window showing Phase 1 output from rag_conflict_demo.py. All three scenarios return wrong or outdated answers with confidence scores between 78% and 81%. Each scenario shows the conflict pair that was detected but not resolved.
Retrieval succeeded each time. The QA mannequin nonetheless answered from whichever conflicting doc it attended to most — silently and confidently. Picture by Creator.

Three questions. Three flawed solutions. Confidence between 78% and 81% on each one among them.

Discover what is going on within the logs earlier than every response:

09:02:20 | WARNING  | Battle detected: {('fin-001', 'fin-002'): "numerical contradiction..."}
09:02:24 | WARNING  | Battle detected: {('hr-001', 'hr-002'): "contradiction sign asymmetry..."}
09:02:25 | WARNING  | Battle detected: {('api-001', 'api-002'): "contradiction sign asymmetry..."}

The conflicts are detected. They’re logged. After which, as a result of resolve_conflicts=False, the pipeline passes the total contradictory context to the mannequin and solutions anyway. That warning goes nowhere. In a manufacturing system with no battle detection layer, you wouldn’t even get the warning.


Why the Mannequin Behaves This Method

This requires a second of rationalization, as a result of the mannequin is just not damaged. It’s doing precisely what it was skilled to do.

deepset/minilm-uncased-squad2 is an extractive QA mannequin. It reads a context string and selects the span with the best mixed start-logit and end-logit rating. It has no output class for “I see two contradictory claims.” When the context incorporates each $4.2M and $6.8M, the mannequin computes token-level scores throughout your complete string and selects whichever span wins.

That choice is pushed by components that don’t have anything to do with correctness [8]. The 2 major drivers are:

Place bias. Earlier spans within the context obtain marginally greater consideration scores because of the encoder structure. The preliminary doc ranked greater in retrieval and due to this fact appeared first.

Language energy. Direct declarative statements (“income of $4.2M”) outscore hedged or conditional phrasing (“following restatement… is $6.8M”).

A 3rd contributing issue is lexical alignment — spans whose vocabulary overlaps extra intently with the query tokens rating greater no matter whether or not the underlying declare is present or authoritative.

Critically, what the mannequin does not take into account in any respect: supply date, doc authority, audit standing, or whether or not one declare supersedes one other. These indicators are merely invisible to the extractive mannequin.

A diagram showing the three retrieved documents concatenated into a context string. The QA model assigns a higher confidence score to the $4.2M span from the first document because it appears earlier and uses direct declarative language, even though the $6.8M figure from the second document is more recent and authoritative.
The mannequin has no mechanism to weigh supply date or audit authority. It picks the span with the best confidence rating — and place wins. Picture by Creator.

The identical dynamic performs out in generative LLMs, however much less visibly — the mannequin paraphrases relatively than extracting verbatim spans, so the flawed reply is wearing fluent prose. The mechanism is identical. Joren et al. (2025) display at ICLR 2025 that frontier fashions together with Gemini 1.5 Professional, GPT-4o, and Claude 3.5 regularly produce incorrect solutions relatively than abstaining when retrieved context is inadequate to reply the question — and that this failure is just not mirrored within the mannequin’s expressed confidence.

The failure is just not a mannequin deficiency. It’s an architectural hole: the pipeline has no stage that detects contradictions earlier than handing context to technology.


Constructing the Battle Detection Layer

Diagram of a five-component RAG system architecture showing Document, KnowledgeBase, ConflictDetector, RAGPipeline, and RAGResponse with data flow and internal processing steps.
A modular RAG pipeline structure displaying doc ingestion, embedding-based retrieval, battle detection, QA processing, and structured response technology. Picture by Creator.

The detector sits between retrieval and technology. It examines each pair of retrieved paperwork and flags contradictions earlier than the QA mannequin sees the context. Crucially, embeddings for all retrieved paperwork are computed in a single batched ahead cross earlier than pair comparability begins — every doc is encoded precisely as soon as, no matter what number of pairs it participates in.

Two heuristics do the work.


Heuristic 1: Numerical Contradiction

Two topic-similar paperwork that include non-overlapping significant numbers are flagged. The implementation filters out years (1900–2099) and naked small integers (1–9), which seem ubiquitously in enterprise textual content and would generate fixed false positives if handled as declare values.

@classmethod
def _extract_meaningful_numbers(cls, textual content: str) -> set[str]:
    outcomes = set()
    for m in cls._NUM_RE.finditer(textual content):
        uncooked = m.group().strip()
        numeric_core = re.sub(r"[$€£MBK%,]", "", uncooked, flags=re.IGNORECASE).strip()
        attempt:
            val = float(numeric_core)
        besides ValueError:
            proceed
        if 1900 <= val <= 2099 and "." not in numeric_core:
            proceed   # skip years
        if val < 10 and re.fullmatch(r"d+", uncooked):
            proceed   # skip naked small integers
        outcomes.add(uncooked)
    return outcomes

Utilized to State of affairs A: fin-001 yields {'$4.2M'}, fin-002 yields {'$6.8M'}. Empty intersection — battle detected.


Heuristic 2: Contradiction Sign Asymmetry

Two paperwork discussing the identical matter, the place one incorporates contradiction tokens the opposite doesn’t, are flagged. The token set splits into two teams saved as separate frozenset objects:

  • _NEGATION_TOKENS: “not”, “by no means”, “no”, “can not”, “doesn’t”, “isn’t”, and associated varieties
  • _DIRECTIONAL_TOKENS: “elevated”, “decreased”, “lowered”, “eradicated”, “eliminated”, “discontinued”

These are unioned into CONTRADICTION_SIGNALS. Conserving them separate makes domain-specific tuning simple — a authorized corpus may want a broader negation set; a changelog corpus may want extra directional tokens.

Utilized to State of affairs B: hr-002 incorporates “no” (from “now not required”); hr-001 doesn’t. Asymmetry detected. Utilized to State of affairs C: api-002 incorporates “elevated”; api-001 doesn’t. Asymmetry detected.

Each heuristics require topic_sim >= 0.68 earlier than firing. This threshold gates out unrelated paperwork that occur to share a quantity or a negation phrase. The 0.68 worth was calibrated for this doc set with all-MiniLM-L6-v2 — deal with it as a place to begin, not a common fixed. Totally different embedding fashions and completely different domains would require recalibration.


The Decision Technique: Cluster-Conscious Recency

When conflicts are detected, the pipeline resolves them by holding probably the most just lately timestamped doc from every battle cluster. The important thing design choice is cluster-aware.

A top-k end result might include a number of unbiased battle clusters — two monetary paperwork disagreeing on income and two API paperwork disagreeing on charge limits, all in the identical top-3 end result. A naive strategy — maintain solely the only most up-to-date doc from the mixed conflicting set — would silently discard the profitable doc from each cluster besides probably the most just lately printed one general.

As a substitute, the implementation builds a battle graph, finds related parts through iterative DFS, and resolves every element independently:

@staticmethod
def _resolve_by_recency(
    contexts: record[RetrievedContext],
    battle: ConflictReport,
) -> record[RetrievedContext]:
    # Construct adjacency record
    adj: dict[str, set[str]] = defaultdict(set)
    for a_id, b_id in battle.conflict_pairs:
        adj[a_id].add(b_id)
        adj[b_id].add(a_id)

    # Related parts through iterative DFS
    visited: set[str] = set()
    clusters: record[set[str]] = []
    for begin in adj:
        if begin not in visited:
            cluster: set[str] = set()
            stack = [start]
            whereas stack:
                node = stack.pop()
                if node not in visited:
                    visited.add(node)
                    cluster.add(node)
                    stack.lengthen(adj[node] - visited)
            clusters.append(cluster)

    all_conflicting_ids = set().union(*clusters) if clusters else set()
    non_conflicting = [c for c in contexts if c.document.doc_id not in all_conflicting_ids]

    resolved_docs = []
    for cluster in clusters:
        cluster_ctxs = [c for c in contexts if c.document.doc_id in cluster]
        # ISO-8601 timestamps type lexicographically — max() provides most up-to-date
        greatest = max(cluster_ctxs, key=lambda c: c.doc.timestamp)
        resolved_docs.append(greatest)

    return non_conflicting + resolved_docs

Non-conflicting paperwork cross via unchanged. Every battle cluster contributes precisely one winner.


Part 2: What Battle-Conscious RAG Does

────────────────────────────────────────────────────────────────────
  RESOLVED  |  State of affairs A — Numerical Battle
────────────────────────────────────────────────────────────────────
  Question       : What was Acme Corp's annual income for fiscal yr 2023?
  Reply      : $6.8M
  Confidence  : 79.6%
  Battle    : RESOLVED

  ⚠  Conflicting sources detected — reply derived from most up-to-date
     doc per battle cluster.

  Sources retrieved
    [0.820] 2023-Annual-Report-Revised          (2024-04-03)
    [0.589] Firm-Overview-2024               (2024-01-01)

  Battle cluster resolved: saved '2023-Annual-Report-Revised' (2024-04-03),
  discarded 1 older doc(s).
────────────────────────────────────────────────────────────────────

────────────────────────────────────────────────────────────────────
  RESOLVED  |  State of affairs B — Coverage Battle
────────────────────────────────────────────────────────────────────
  Reply      : staff are now not required to keep up
                a set in-office schedule
  Confidence  : 78.0%
  Battle    : RESOLVED

  Battle cluster resolved: saved 'HR-Coverage-November-2023' (2023-11-15),
  discarded 1 older doc(s).
────────────────────────────────────────────────────────────────────

────────────────────────────────────────────────────────────────────
  RESOLVED  |  State of affairs C — Technical Battle
────────────────────────────────────────────────────────────────────
  Reply      : 500 requests per minute
  Confidence  : 80.9%
  Battle    : RESOLVED

  Battle cluster resolved: saved 'API-Reference-v2.0' (2023-09-20),
  discarded 1 older doc(s).
────────────────────────────────────────────────────────────────────
Terminal-style diagram showing a conflict-aware RAG system correctly resolving numerical, policy, and technical conflicts across three scenarios and producing correct answers.
A conflict-aware RAG system resolves contradictions in retrieved paperwork and produces appropriate, up-to-date solutions throughout monetary, HR, and API queries. Picture by Creator.

Three questions. Three appropriate solutions. The arrogance scores are virtually similar to Part 1 — 78–81% — which underscores the unique level: confidence was by no means the sign that one thing had gone flawed. It nonetheless is just not. The one factor that modified is the structure.

A three-row comparison table showing the same query answered by Naive RAG and Conflict-Aware RAG side by side. Naive RAG returns $4.2M, 3 days/week in-office, and 100 requests per minute — all wrong. Conflict-Aware RAG returns $6.8M, fully remote permitted, and 500 requests per minute — all correct.
Similar retriever, identical mannequin, identical question. The one distinction is whether or not battle detection runs earlier than context is handed to the QA mannequin. Picture by Creator.

What the Heuristics Can not Catch

I need to be exact in regards to the failure envelope, as a result of a way that understates its personal limitations is just not helpful.

Paraphrased conflicts. The heuristics catch numerical variations and express contradiction tokens. They won’t catch “the service was retired” versus “the service is at present out there.” That may be a actual battle with no numeric distinction and no negation token. For these, a Pure Language Inference mannequin — cross-encoder/nli-deberta-v3-small at ~80 MB — can rating entailment versus contradiction between sentence pairs. That is the extra strong path described within the tutorial literature (Asai et al., 2023), and the ConflictDetector class is designed to be prolonged on the _pair_conflict_reason methodology for precisely this objective.

Non-temporal conflicts. Recency-based decision is suitable for versioned paperwork and coverage updates. It isn’t applicable for knowledgeable opinion disagreements (the minority view could also be appropriate), cross-methodology knowledge conflicts (recency is irrelevant), or multi-perspective queries (the place surfacing each views is the suitable response). In these circumstances, the ConflictReport knowledge construction gives the uncooked materials to construct a distinct response — surfacing each claims, flagging for human assessment, or asking the consumer for clarification.

Scale. Pair comparability is O(k²) in retrieved paperwork. For ok=3 that is trivial; for ok=20 it’s nonetheless high quality. For pipelines retrieving ok=100 or extra, pre-indexing recognized battle pairs or cluster-based detection turns into vital.


The place the Analysis Neighborhood Is Taking This

What you’ve gotten seen here’s a sensible heuristic approximation of an issue that lively analysis is attacking at a way more refined degree.

Cattan et al. (2025) launched the CONFLICTS benchmark — the primary particularly designed to trace how fashions deal with data conflicts in sensible RAG settings. Their taxonomy identifies 4 battle classes — freshness, conflicting opinions, complementary data, and misinformation — every requiring distinct mannequin behaviour. Their experiments present that LLMs regularly fail to resolve conflicts appropriately throughout all classes, and that explicitly prompting fashions to cause about potential conflicts considerably improves response high quality, although substantial room for enchancment stays.

Ye et al. (2026) launched TCR (Clear Battle Decision), a plug-and-play framework that disentangles semantic relevance from factual consistency through twin contrastive encoders. Self-answerability estimation gauges confidence within the mannequin’s parametric reminiscence, and the ensuing scalar indicators are injected into the generator through light-weight soft-prompt tuning. Throughout seven benchmarks, TCR improves battle detection by 5–18 F1 factors whereas including solely 0.3% parameters.

Gao et al. (2025) launched CLEAR (Battle-Localized and Enhanced Consideration for RAG), which probes LLM hidden states on the sentence illustration degree to detect the place conflicting data manifests internally. Their evaluation reveals that data integration happens hierarchically and that conflicting versus aligned data reveals distinct distributional patterns inside sentence-level representations. CLEAR makes use of these indicators for conflict-aware fine-tuning that guides the mannequin towards correct proof integration.

The constant discovering throughout all of this work matches what this experiment demonstrates immediately: retrieval high quality and reply high quality are distinct dimensions, and the hole between them is bigger than the group has traditionally acknowledged.

The distinction between that analysis and this text is 220 MB and no authentication.


What You Ought to Truly Do With This

1. Add a battle detection layer earlier than technology. The ConflictDetector class is designed to drop into an present pipeline on the level the place you assemble your context string. Even the 2 easy heuristics right here will catch the patterns that seem most frequently in enterprise corpora: restatements, coverage updates, versioned documentation.

2. Distinguish battle sorts earlier than resolving. A temporal battle (use the newer doc) is a distinct downside from a factual dispute (flag for human assessment) or an opinion battle (floor each views). A single decision technique utilized blindly creates new failure modes.

3. Log each ConflictReport. After every week of manufacturing site visitors you’ll understand how usually your particular corpus generates conflicting retrieved units, which doc pairs battle most regularly, and what question patterns set off conflicts. That knowledge is extra actionable than any artificial benchmark.

4. Floor uncertainty once you can not resolve it. The precise reply to an unresolvable battle is to not decide one and conceal the selection. The warning subject in RAGResponse is there exactly to help responses like: “I discovered conflicting data on this matter. The June 2023 coverage states X; the November 2023 replace states Y. The November doc is newer.”


Working the Full Demo

# Full output with INFO logs
python rag_conflict_demo.py

# Demo output solely (suppress mannequin loading logs)
python rag_conflict_demo.py --quiet

# Run unit assessments with out downloading fashions
python rag_conflict_demo.py --test

# Plain terminal output for log seize / CI
python rag_conflict_demo.py --no-color

All output proven on this article is unmodified output from an area Home windows machine operating Python 3.9+ in a digital atmosphere. The code and output are totally reproducible by any reader with the listed dependencies put in.


The Takeaway

The retrieval downside is essentially solved. Vector search is quick, correct, and well-understood. The group has spent years optimising it.

The context-assembly downside is just not solved. No one is measuring it. The hole between “appropriate paperwork retrieved” and “appropriate reply produced” is actual, it’s common, and it produces assured flawed solutions with no sign that something went flawed.

The repair doesn’t require a bigger mannequin, a brand new structure, or extra coaching. It requires one extra pipeline stage, operating on embeddings you have already got, at zero marginal latency.

The experiment above runs in about thirty seconds on a laptop computer. The query is whether or not your manufacturing system has the equal layer — and if not, what it’s silently answering flawed proper now.


References

[1] Ye, H., Chen, S., Zhong, Z., Xiao, C., Zhang, H., Wu, Y., & Shen, F. (2026). Seeing via the battle: Clear data battle dealing with in retrieval-augmented technology. arXiv:2601.06842. https://doi.org/10.48550/arXiv.2601.06842

[2] Asai, A., Wu, Z., Wang, Y., Sil, A., & Hajishirzi, H. (2023). Self-RAG: Studying to retrieve, generate, and critique via self-reflection. arXiv:2310.11511. https://doi.org/10.48550/arXiv.2310.11511

[3] Cattan, A., Jacovi, A., Ram, O., Herzig, J., Aharoni, R., Goldshtein, S., Ofek, E., Szpektor, I., & Caciularu, A. (2025). DRAGged into conflicts: Detecting and addressing conflicting sources in search-augmented LLMs. arXiv:2506.08500. https://doi.org/10.48550/arXiv.2506.08500

[4] Gao, L., Bi, B., Yuan, Z., Wang, L., Chen, Z., Wei, Z., Liu, S., Zhang, Q., & Su, J. (2025). Probing latent data battle for devoted retrieval-augmented technology. arXiv:2510.12460. https://doi.org/10.48550/arXiv.2510.12460

[5] Jin, Z., Cao, P., Chen, Y., Liu, Okay., Jiang, X., Xu, J., Li, Q., & Zhao, J. (2024). Tug-of-war between data: Exploring and resolving data conflicts in retrieval-augmented language fashions. arXiv:2402.14409. https://doi.org/10.48550/arXiv.2402.14409

[6] Joren, H., Zhang, J., Ferng, C.-S., Juan, D.-C., Taly, A., & Rashtchian, C. (2025). Ample context: A brand new lens on retrieval augmented technology techniques. arXiv:2411.06037. https://doi.org/10.48550/arXiv.2411.06037

[7] Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., … & Kiela, D. (2020). Retrieval-augmented technology for knowledge-intensive NLP duties. arXiv:2005.11401. https://doi.org/10.48550/arXiv.2005.11401

[8] Mallen, A., Asai, A., Zhong, V., Das, R., Khashabi, D., & Hajishirzi, H. (2023). When to not belief language fashions: Investigating effectiveness of parametric and non-parametric reminiscences. arXiv:2212.10511. https://doi.org/10.48550/arXiv.2212.10511

[9] Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings utilizing Siamese BERT-networks. arXiv:1908.10084. https://doi.org/10.48550/arXiv.1908.10084

[10] Xu, R., Qi, Z., Guo, Z., Wang, C., Wang, H., Zhang, Y., & Xu, W. (2024). Data conflicts for LLMs: A survey. arXiv:2403.08319. https://doi.org/10.48550/arXiv.2403.08319

[11] Xie, J., Zhang, Okay., Chen, J., Lou, R., & Su, Y. (2023). Adaptive chameleon or cussed sloth: Revealing the habits of huge language fashions in data conflicts. arXiv:2305.13300. https://doi.org/10.48550/arXiv.2305.13300

Full Supply Code: https://github.com/Emmimal/rag-conflict-demo


Fashions Used

Each fashions obtain robotically on first run and cache domestically. No API key or HuggingFace authentication is required.


Disclosure

All code was written, debugged, and validated by the writer via a number of iterations of actual execution. All terminal output on this article is unmodified output from an area Home windows machine operating Python 3.9+ in a digital atmosphere. The code and output are totally reproducible by any reader with the listed dependencies put in.

The writer has no monetary relationship with Hugging Face, deepset, or any organisation referenced on this article. Mannequin and library selections had been made solely on the premise of measurement, licence, and CPU compatibility.

design and run an agent in rehearsal – earlier than constructing it


Most AI brokers fail due to a spot between design intent and manufacturing actuality. Builders usually spend days constructing solely to seek out that escalation logic or device calls fail within the wild, forcing a complete restart. DataRobot Agent Help closes this hole. It’s a pure language CLI device that permits you to design, simulate, and validate your agent’s habits in “rehearsal mode” earlier than you write any implementation code. This weblog will present you the right way to execute the complete agent lifecycle from logic design to deployment inside a single terminal session, saving you additional steps, rework, and time.

rapidly develop and ship an agent from a CLI

DataRobot’s Agent Help is a CLI device constructed for designing, constructing, simulating, and transport manufacturing AI brokers. You run it out of your terminal, describe in pure language what you need to construct, and it guides the complete journey from thought to deployed agent, with out switching contexts, instruments, or environments.

It really works standalone and integrates with the DataRobot Agent Workforce Platform for deployment, governance, and monitoring. Whether or not you’re a solo developer prototyping a brand new agent or an enterprise workforce transport to manufacturing, the workflow is identical: design, simulate, construct, deploy.

Customers are going from thought to a working agent rapidly, lowering the scaffolding and setup time from days to minutes.

Why not simply use a general-purpose coding agent?

Common AI coding brokers are constructed for breadth. That breadth is their power, however it’s precisely why they fall brief for manufacturing AI brokers.

Agent Help was constructed for one factor: AI brokers. That focus shapes each a part of the device. The design dialog, the spec format, the rehearsal system, the scaffolding, and the deployment are all purpose-built for the way brokers really work. It understands device definitions natively. It is aware of what a production-grade agent wants structurally earlier than you inform it. It might simulate habits as a result of it was designed to consider brokers finish to finish.

Agent Help in comparison with generic AI coding instruments

The agent constructing journey: from dialog to manufacturing

Step 1: Begin designing your agent with a dialog

You open your terminal and run dr help. No mission setup, no config information, no templates to fill out. You’ll instantly get a immediate asking what you need to construct.

Agent Help asks follow-up questions, not solely technical ones, however enterprise ones too. What methods does it want entry to? What does a great escalation seem like versus an pointless one? How ought to it deal with a annoyed buyer otherwise from somebody with a easy query?

 Guided questions and prompts will assist with constructing an entire image of the logic, not simply gathering an inventory of necessities. You may preserve refining your concepts for the agent’s logic and habits in the identical dialog. Add a functionality, change the escalation guidelines, modify the tone. The context carries ahead and the whole lot updates mechanically.

For builders who need fine-grained management, Agent Help additionally offers configuration choices for mannequin choice, device definitions, authentication setup, and integration configuration, all generated instantly from the design dialog.

When the image is full, Agent Help generates a full specification: system immediate, mannequin choice, device definitions, authentication setup, and integration configuration. One thing a developer can construct from and a enterprise stakeholder can really evaluation earlier than any code exists. From there, that spec turns into the enter to the following step: working your agent in rehearsal mode, earlier than a single line of implementation code is written.

Step 2: Watch your agent run earlier than you construct it

That is the place Agent Help does one thing no different device does.

Earlier than writing any implementation, it runs your agent in rehearsal mode. You describe a state of affairs and it executes device calls in opposition to your precise necessities, exhibiting you precisely how the agent would behave. You see each device that fires, each API name that will get made, each determination the agent takes.

If the escalation logic is fallacious, you catch it right here. If a device returns information in an surprising format, you see it now as an alternative of in manufacturing. You repair it within the dialog and run it once more.

You validate the logic, the integrations, and the enterprise guidelines , and solely transfer to code when the habits is precisely what you need.

Step 3: The code that comes out is already production-ready

While you transfer to code era, Agent Help doesn’t hand you a place to begin. It arms you a basis.

The agent you designed and simulated comes scaffolded with the whole lot it must run in manufacturing, together with OAuth authentication (no shared API keys), modular MCP server parts, deployment configuration, monitoring, and testing frameworks. Out of the field, Agent Help handles infrastructure that usually takes days to piece collectively.

The code is clear, documented, and follows commonplace patterns. You may take it and proceed constructing in your most popular atmosphere. However from the very first file, it’s one thing you might present to a safety workforce or hand off to ops with no disclaimer.

Step 4: Deploy from the identical terminal you in-built

If you find yourself able to ship, you keep in the identical workflow. Agent Help is aware of your atmosphere, the fashions accessible to you, and what a sound deployment requires. It validates the configuration earlier than touching something.

One command. Any atmosphere: on-prem, edge, cloud, or hybrid. Validated in opposition to your goal atmosphere’s safety and mannequin constraints. The identical agent that helped you design and simulate additionally is aware of the right way to ship it.

What groups are saying about Agent Help

“The toughest a part of AI agent improvement is requirement definition, particularly bridging the hole between technical groups and area consultants. Agent Help solves this interactively. A site person can enter a tough thought, and the device actively guides them to flesh out the lacking particulars. As a result of area consultants can instantly take a look at and validate the outputs themselves, Agent Help dramatically shortens the time from requirement scoping to precise agent implementation.”

The street forward for Agent Help

AI brokers have gotten core enterprise infrastructure, not experiments, and the tooling round them must catch up. The subsequent section of Agent Help goes deeper on the elements that matter most as soon as brokers are working in manufacturing: richer tracing and analysis so you possibly can perceive what your agent is definitely doing, native experimentation so you possibly can take a look at adjustments with out touching a stay atmosphere, and tighter integration with the broader ecosystem of instruments your brokers work with. The objective stays the identical: much less time debugging, extra time transport.

The exhausting half was by no means writing the code. It was the whole lot round it: realizing what to construct, validating it earlier than it touched manufacturing, and trusting that what shipped would preserve working. Agent Help is constructed round that actuality, and that’s the course it should preserve shifting in.

Get began with Agent Help in 3 steps

Able to ship your first manufacturing agent? Right here’s all you want:

1.  Install the toolchain:

brew set up datarobot-oss/faucets/dr-cli uv pulumi/faucet/pulumi go-task node git python

2.  Set up Agent Help:

dr plugin set up help

3.  Launch:

dr help

Full documentation, examples, and superior configuration are within the Agent Help documentation.

Android Auto 5 necessities ideas and fixes

0


Android Auto has been round for greater than a decade, and it’s an incredible resolution for staying related along with your cellphone safely whereas driving. Most automobiles assist a wired Android Auto connection, so be sure to have the greatest cables for the job. Wi-fi Android Auto is more and more widespread in lots of new automotive fashions, not simply premium ones. Plus, Google retains updating Auto with new options like assist for Google Meet in Android Auto.

Whereas it really works nice out of the field, there are a couple of settings I at all times tweak each time I exploit a brand new cellphone with my automotive. These are easy toggle switches that may make a world of distinction when utilizing Android Auto every day, and all of them might be completed through your cellphone itself. Listed below are my high 5 settings and tweaks that I like to recommend making when you’ve simply set it up.

(Picture credit score: Roydon Cerejo / Android Central)

One of the crucial annoying default settings of Android Auto is that your music will begin enjoying routinely as quickly as your cellphone connects to your automotive. It may be very jarring at night time when you had the amount turned all the way in which up the final time you used the automotive. It can be downright embarrassing, relying on who’s within the automotive with you, and your secret responsible pleasure music begins blaring. Maintain your dignity by merely disabling this toggle change.

Article continues beneath

Black gap jets measured for first time and rival the ability of 10,000 suns

0


Researchers have taken a significant step towards understanding how black holes affect the universe by instantly measuring the ability of their jets. Utilizing a community of radio telescopes unfold throughout the globe, a group led by Curtin College captured detailed photos that reveal simply how energetic these jets might be. The findings assist long-standing theories concerning the function black holes play in shaping the construction of galaxies.

The examine, revealed in Nature Astronomy, centered on Cygnus X-1, a well known system that features the primary confirmed black gap and a large supergiant star. Scientists decided that the jets streaming from this black gap carry an power output equal to about 10,000 Suns.

To make this measurement, the group relied on a extensively spaced array of telescopes working collectively as one. This setup allowed them to observe how the jets had been pushed and distorted by highly effective winds coming from the close by star because the black gap traveled alongside its orbit. The impact is just like how robust gusts on Earth can bend a stream of water from a fountain.

Utilizing Stellar Winds to Reveal Jet Power

By calculating the energy of the star’s wind and monitoring how a lot the jets had been deflected, researchers had been in a position to decide the jets’ energy at a selected second. This marks the primary time scientists have instantly measured the instantaneous power of black gap jets fairly than counting on long-term averages.

The group additionally measured the jets’ velocity, discovering that they journey at roughly half the velocity of sunshine, or about 150,000 kilometers per second. Figuring out this velocity has been a problem for scientists for a few years.

The undertaking was led by the Curtin Institute of Radio Astronomy (CIRA) and the Curtin node of the Worldwide Centre for Radio Astronomy Analysis (ICRAR), with contributions from the College of Oxford.

“Dancing Jets” Supply New Perception

Lead creator Dr. Steve Prabu, who labored at CIRA in the course of the examine and is now on the College of Oxford, defined that the group used a sequence of photos to trace what he described as “dancing jets.” This time period refers back to the approach the jets shift route repeatedly as they’re pushed by the supergiant star’s robust winds whereas each objects orbit one another.

Dr. Prabu stated these observations reveal how a lot of the power generated close to a black gap is transferred into its environment, influencing the atmosphere round it.

“A key discovering from this analysis is that about 10 per cent of the power launched as matter falls in in direction of the black gap is carried away by the jets,” Dr. Prabu stated.

“That is what scientists often assume in large-scale simulated fashions of the Universe, however it has been exhausting to substantiate by remark till now.”

Confirming Theories About Black Gap Physics

Co-author Professor James Miller-Jones, from CIRA and the Curtin node of ICRAR, famous that earlier methods might solely estimate jet energy over extraordinarily lengthy durations, typically spanning 1000’s or thousands and thousands of years. This made it troublesome to instantly evaluate jet power with the X-ray emissions produced as matter falls right into a black gap.

“And since our theories counsel that the physics round black holes could be very comparable, we will now use this measurement to anchor our understanding of jets, whether or not they’re from black holes 10 or 10 million instances the mass of the Solar,” Professor Miller-Jones stated.

“With radio telescope tasks such because the Sq. Kilometre Array Observatory presently below development in Western Australia and South Africa, we anticipate to detect jets from black holes in thousands and thousands of distant galaxies, and the anchor level offered by this new measurement will assist calibrate their total energy output.

“Black gap jets present an necessary supply of suggestions to the encircling atmosphere and are vital to understanding the evolution of galaxies.”

Different collaborators on the analysis included the College of Barcelona, the College of Wisconsin-Madison, the College of Lethbridge and the Institute of House Science.

Ease into Azure Kubernetes Utility Community

0

Because the Utility Community service is in preview, begin by registering it in your account. This could take a while, however as soon as it’s registered you’ll be able to set up the AppNet CLI extension that’s used to handle and management Utility Community to your AKS clusters. Now you can begin to arrange the ambient service mesh, both creating new clusters to make use of it, or including the service mesh to current AKS deployments.

Ranging from scratch is the best method, because it ensures that you simply’re working in the identical tenant. AKS clusters and Utility Community will be in the identical useful resource group if you need, nevertheless it’s not crucial. You’re free to make use of separate useful resource teams for administration.

The appnet command makes it simple to create an Utility Community from the command line; all you want is a reputation for the community, a useful resource group, a location, and an identification sort. When you’ve run the command to create your ambient mesh, look forward to the mesh to be provisioned earlier than becoming a member of a cluster to your community. This once more merely wants a useful resource group, a reputation for the member cluster, and its useful resource group and cluster identify. On the similar time, you outline how the community can be managed, i.e. whether or not you handle upgrades your self or go away Azure to handle them for you. Extra clusters will be added to the community the identical method.

Why I simply canceled ChatGPT Plus and two different AI subscriptions

0


Bryan Wolfe / Android Authority

A couple of months in the past, I reviewed my AI subscriptions and easily requested: “Am I really utilizing this?” The reply, in three out of 4 instances, was not likely. I used to be paying for Adobe Firefly, ChatGPT Plus, and Perplexity Professional. Every served a unique function on paper, however in actuality, I used them equally and paid 3 times for a similar comfort. I canceled all three, switched to free options, and saved about $50/month within the course of

To make sense of my selections and their penalties, let me lead you thru what I reduce, what I changed it with, and my sincere tackle the tradeoffs.

What number of AI subscriptions do you pay for?

186 votes

Adobe Firefly → Ideogram

An image of an orange cat in a suit in an office meeting a bunch of other cats, generated by AI using Ideogram

I subscribed to Adobe Firefly due to one particular promoting level: it’s skilled on licensed content material, which, in concept, makes it safer to make use of commercially. For anybody producing content material professionally, that issues, or a minimum of it sounds prefer it ought to.

The truth is that I used to be producing AI photos sometimes, not continually. And Firefly’s output, whereas clear, hardly ever blew me away. I used to be paying for a security assure I didn’t usually want and picture high quality that free instruments have largely caught as much as.

I switched to Ideogram for header photos, social graphics, and occasional illustrations for my journey web site. In my case, I take advantage of the occasional photos for 48-hour metropolis guides and social graphics for the location’s Instagram account.

The free tier offers you loads of generations, and the standard of photorealistic and stylized prompts is spectacular. I haven’t as soon as wished I have been again on Firefly.

What I miss about Firely: The industrial licensing peace of thoughts, if I’m being sincere. For those who’re producing work the place IP possession is an actual concern, Firefly’s coaching information argument nonetheless holds. For many informal customers, although, it’s laborious to justify the associated fee.

Verdict: I canceled Firefly as a result of I didn’t want its particular benefits and located that free options have been adequate for my wants. No regrets right here.

ChatGPT Plus → free ChatGPT (with a caveat)

ChatGPT example

Bryan Wolfe / Android Authority

This one is trickier to speak about as a result of I didn’t simply change to free ChatGPT; I already had Claude Professional, which prices about the identical as ChatGPT Plus. As such, I didn’t actually save $20; I redirected it. However the cancellation was nonetheless price it.

I stored ChatGPT Plus largely out of behavior. Most of my use was for fast queries that free ChatGPT might deal with. The actual concern wasn’t its functionality, however that I used to be utilizing it routinely.

As a contract tech author, the audit was about figuring out which instruments really added skilled worth.

For those who’re an informal ChatGPT consumer, the free tier covers the overwhelming majority of on a regular basis duties. Summarizing, drafting, answering questions, serving to you suppose by means of issues — it’s all there. GPT-4o entry on the free tier is rate-limited, however until you’re often hitting these limits, you in all probability received’t discover.

What I miss about ChatGPT Plus: Limitless entry to GPT-4o. On heavy-use days, the speed limits on the free tier are actual and sometimes irritating. For those who’re an influence consumer who leans on ChatGPT continually all through the day, Plus should be price it.

Verdict: I canceled ChatGPT Plus as a result of it overlapped with Claude Professional, and the free model met my day-to-day wants. This made the choice simpler.

Perplexity Professional → free Perplexity

Perplexity example

Bryan Wolfe / Android Authority

This is perhaps probably the most easy of the three cancellations. I subscribed to Perplexity Professional for its AI-powered search and extra options, however the easy reality is that I didn’t use them.

I largely used Perplexity for fast analysis, the place I wished synthesized solutions with verifiable hyperlinks. The free tier did this simply as effectively; I hardly ever hit its limits, and mannequin variations weren’t important for my wants.

The Professional upsell makes extra sense in the event you’re doing heavy, sustained analysis and wish entry to the expanded mannequin. For normal use, the free model is without doubt one of the higher free instruments within the AI house, full cease.

What I miss about Perplexity Professional: Nothing, genuinely. That is the cleanest cancellation of the three.

Verdict: I canceled Perplexity Professional as a result of the free tier supplied all the things I wanted. No options have been missed, and there have been no drawbacks.

The paid AI subscription I stored

Claude example

Bryan Wolfe / Android Authority

Having mentioned all of that, I nonetheless pay for one AI subscription: Claude Professional.

To be clear, this isn’t a criticism of the instruments above — they every work effectively. Nevertheless, of all my subscriptions, Claude Professional was the one one performing common, particular duties I couldn’t get without spending a dime elsewhere. Like my colleague Andrew Grush not too long ago found, I discovered that shifting from ChatGPT to Claude Professional was good for me.

I take advantage of Claude Professional for journalism, B2B consumer work, coding for my web site, and writing a novel, which requires managing complexity over lengthy periods. For journalism, consumer work, and a novel in progress, Claude Professional was indispensable in methods the opposite providers above weren’t.

The best subscription is completely different for each particular person. Your audit may land someplace fully completely different. The purpose isn’t which instrument wins — it’s doing the audit within the first place.

What this train really taught me

claude homepage

Calvin Wankhede / Android Authority

The actual lesson wasn’t about AI instruments, however the hole between my notion and actuality.

I subscribed to Firefly for industrial licensing, stored ChatGPT Plus out of behavior, and tried Perplexity Professional for its interesting options. None of these are nice causes to maintain spending cash.

For those who haven’t checked out your AI subscriptions recently, open your bank card assertion and ask your self the identical query I did: Am I really utilizing this? Not “might I take advantage of this” or “do I like having this” — am I utilizing it sufficient to justify the associated fee?

You is perhaps shocked by the reply.

Don’t wish to miss one of the best from Android Authority?

google preferred source badge light@2xgoogle preferred source badge dark@2x

Thanks for being a part of our neighborhood. Learn our Remark Coverage earlier than posting.

Experimental drug doubles one-year survival in pancreatic most cancers

0

An experimental remedy has doubled one-year survival charges for pancreatic most cancers, one of many deadliest forms of most cancers, a brand new research stories.

The drug, known as elraglusib, targets the protecting net that pancreatic tumors construct round themselves, thus serving to immune molecules and chemotherapy higher penetrate the tumors. The outcomes of the trial displaying elraglusib’s security and efficacy have been revealed April 14 within the journal Nature Drugs.

Stata Conferences and Conferences Replace

0


Dwelling
> Conferences > Stata Conferences and Conferences Replace

Stata Conferences and Conferences Replace

Between now and the tip of the 12 months, the annual Stata Convention in the USA will happen together with 5 different Stata conferences in nations around the globe.

Stata conferences and conferences characteristic talks by each Stata customers and Stata builders and supply a chance to assist form the way forward for Stata improvement by interacting with and offering suggestions on to StataCorp personnel.

The talks vary from longer shows by invited audio system to shorter talks demonstrating using Stata in quite a lot of fields. Some talks are statistical in nature whereas others give attention to information administration, graphics, or programming in Stata. New enhancements to Stata created each by customers and by StataCorp are sometimes featured in talks.

The complete schedule of upcoming conferences is

2011 Mexican Stata Customers Group assembly
Might 12, 2011

2011 German Stata Customers Group assembly
July 1, 2011

Stata Convention Chicago 2011
July 14–15, 2011

2011 UK Stata Customers Group assembly
September 15–16, 2011

2011 Spanish Stata Customers Group assembly
September 22, 2011

2011 Nordic and Baltic Stata Customers Group assembly
November 11, 2011

Click on on any assembly title for extra info, together with applications and registration info.



Worldwide Convention on Studying Representations (ICLR) 2026

0


Apple is presenting new analysis on the annual Worldwide Convention on Studying Representations (ICLR), which takes place in particular person in Rio de Janeiro, Brazil, from April 23 to 27. We’re proud to once more sponsor the convention, which brings collectively the scientific and industrial analysis communities targeted on deep studying. Beneath is an outline of Apple’s participation at ICLR 2026:

Bounce to a bit:

Cease by the Apple sales space #204 throughout exhibition hours: 9:30 AM – 5:30 PM (Thursday, April 23 – Saturday, April 25). All occasions referenced in schedule are in BRT (native time).

Schedule

Thursday, April 23

Friday, April 24

Saturday, April 25

Sunday, April 26

Monday, April 27

Native LLM inference on Apple silicon with MLX

This demo will showcase on-device LLM inference on a MacBook Professional with M5 Max utilizing MLX, Apple’s open-source array framework purpose-built for Apple silicon, working a quantized frontier coding mannequin completely regionally inside Xcode’s native improvement setting. The complete stack — MLX, mlx-lm, and mannequin weights — is open supply, inviting the analysis neighborhood to construct on and lengthen these strategies independently.

SHARP

This demo reveals SHARP working on a set of pre-recorded photos or photos captured instantly by the consumer through the demo. Guests will expertise the quick course of from deciding on a picture, processing it with SHARP, and viewing the generated 3D Gaussian level cloud on an iPad Professional with the M5 chip.

Each MLX and SHARP demos might be out there on the Apple Sales space throughout exhibition hours.

Carl Vondrick is the ICLR 2026 Common Chair.

Alexander Toshev and Vladlen Koltun are Senior Space Chairs.

Carl Vondrick, Eugene Ndiaye, Fartash Faghri, Jiatao Gu, Joao Monteiro, Miguel Angel Bautista, Philipp Krähenbühl, Pierre Ablin, Shuangfei Zhai, and Yizhe Zhang, and Zhe Gan are Space Chairs.

Arno Blaas is a Workshop Co-Organizer, and Nicholas Apostoloff and Niv Sivakumar are Workshop Reviewers for “I Can’t Consider It’s Not Higher: Challenges in Utilized Deep Studying (ICBINB) 2026.”

Shirley Zou is a Workshop Co-Organizer for “AI with Recursive Self-Enchancment 2026.”

Adam Golinski, Anastasasiia Filippova, Andrew Silva, Andrew Szot, Arnav Kundu, Arno Blaas, Artem Sevastopolsky, Arwen Bradley, Barry-John Theobald, Chen Chen, Cheng-Yu Hsieh, Devon Hjelm, Gregor Bachmann, Honor Chen, Luca Zappella, Manjot Bilkhu, Meng Cao, Michael Kirchhof, Miguel Sarabia, Mohamad Shahbazi, Nicholas Apostoloff, Nikhil Bhendawade, Nivedha Sivakumar, Noam Elata, Omar Attia, Parth Thakkar, Parshin Shojaee, Peter Grasch, Ping Wang, Ran Liu, Raviteja Vemulapalli, Richard Bai, Roy Xie, Vikramjit Mitra, Vimal Thilak, and Zijin Gu are Reviewers.

AuthorsSilin Gao**, Antoine Bosselut†, Samy Bengio, Emmanuel Abbe

AuthorsMing Gui†‡*, Johannes Schusterbauer†‡*, Timy Phan†‡, Felix Krause†‡, Josh Susskind, Miguel Angel Bautista, Björn Ommer†‡

Adaptive Pondering: Massive Language Fashions Know When to Assume in Latent House

AuthorsDeepro Choudhury†, Sinead Williamson, Adam Goliński, Ning Miao‡, Freddie Bickford Smith†, Michael Kirchhof, Yizhe Zhang, Tom Rainforth†

AuthorsAmir Joudaki†, Giulia Lanzillotta†, Mohammad Samragh Razlighi, Iman Mirzadeh, Keivan Alizadeh, Thomas Hofmann†, Mehrdad Farajtabar, Fartash Faghri

AuthorsSantiago Cuervo†, Skyler Seto, Maureen de Seyssel, Richard He Bai, Zijin Gu, Tatiana Likhomanenko, Navdeep Jaitly, Zakaria Aldeneh

AuthorsBruno Mlodozeniec†**, Pierre Ablin, Louis Béthune, Dan Busbridge, Michal Klein, Jason Ramapuram, Marco Cuturi

AuthorsAleksandr Dremov**†, David Grangier, Angelos Katharopoulos, Awni Hannun

AuthorsHuangjie Zheng, Shansan Gong‡**, Ruixiang Zhang, Tianrong Chen, Jiatao Gu,, Mingyuan Zhou†**, Navdeep Jaitly, Yizhe Zhang

AuthorsVishaal Udandarao†‡, Zhiyun Lu, Xuankai Chang, Yongqiang Wang, Violet Z. Yao, Albin Madapally Jose, Fartash Faghri, Josh Gardner, Chung-Cheng Chiu

AuthorsShansan Gong†**, Ruixiang Zhang, Huangjie Zheng, Jiatao Gu, Navdeep Jaitly, Lingpeng Kong†**, Yizhe Zhang

AuthorsRyan Hoque*, Peide Huang*, David J. Yoon*, Mouli Sivapurapu, Jian Zhang

AuthorsWenhui Cui†**, Christopher M. Sandino, Hadi Pouransar, Ran Liu, Juri Minxha, Ellen L. Zippi, Erdrin Azemi, Behrooz Mahasseni

AuthorsAleksei Petrenko‡, Ben Lipkin†‡**, Kevin Chen, Erik Wijmans, Marco Cusumano-Towner, Raja Giryes, Philipp Krähenbühl

AuthorsStephen Zhang**, Seyed Alireza Mousavi Hosseini**, Michal Klein, Marco Cuturi

AuthorsAmin Karimi Monsefi†‡, Nikhil Bhendawade, Manuel R. Ciosici, Dominic Culver, Yizhe Zhang, Irina Belousova

AuthorsEmily Cheng†, Carmen Amo Alonso‡, Federico Danieli, Arno Blaas, Luca Zappella, Pau Rodríguez, Xavier Suau

AuthorsSilvia Sapora**, Devon Hjelm, Alexander Toshev, Omar Attia, Bogdan Mazoure

AuthorsSumanth Varambally**†, Thomas Voice, Yanchao Solar, Zhifeng Chen, Rose Yu†, Ke Ye

LaDiR: Latent Diffusion Enhances LLMs for Textual content Reasoning

Murray Kang (UCSD), Yizhe Zhang, Nikki Kuang (UCSD), Nicklas Majamaki (UCSD), Navdeep Jaitly, Yian Ma (UCSD), Lianhui Qin (UCSD)

Study to Purpose Effectively with Adaptive Size-based Reward Shaping

Wei Liu (HKUST), Ruochen Zhou (HKUST), Yiyun Deng (HKUST), Yuzhen Huang (HKUST), Jaunting Liu (HLUST), Yuntian Deng (College of Waterloo), Yizhe Zhang, Junxian He (HKUST)

AuthorsShenao Zhang†**, Donghan Yu, Yihao Feng, Bowen Jin‡**, Zhaoran Wang†, John Peebles**, Zirui Wang

AuthorsHsuan Su†, Ting-Yao Hu, Hema Swetha Koppula, Kundan Krishna, Hadi Pouransari, Cheng-Yu Hsieh, Cem Koc, Joseph Yitan Cheng, Oncel Tuzel, Raviteja Vemulapalli

AuthorsYixing Lao†**, Xuyang Bai, Xiaoyang Wu†, Nuoyuan Yan, Zixin Luo, Tian Fang, Jean-Daniel Nahmias, Yanghai Tsin, Shiwei Li, Hengshuang Zhao†

AuthorsJen-Hao Rick Chang‡, Xiaoming Zhao‡, Dorian Chan, Oncel Tuzel

AuthorsYanghao Li, Rui Qian, Bowen Pan, Haotian Zhang, Haoshuo Huang, Bowen Zhang†**, Jialing Tong, Haoxuan You, Xianzhi Du, Zhe Gan, Hyunjik Kim, Chao Jia, Zhenbang Wang, Yinfei Yang, Mingfei Gao, Zi-Yi Dou, Wenze Hu, Chang Gao, Dongxu Li, Philipp Dufter, Zirui Wang, Guoli Yin, Zhengdong Zhang, Chen Chen, Yang Zhao, Ruoming Pang†**, Zhifeng Chen

AuthorsFartash Faghri*, Pavan Kumar Anasossalu Vasu*, Cem Koc, Vaishaal Shankar†, Alexander Toshev, Oncel Tuzel, Hadi Pouransari

AuthorsSarah Ball†, Greg Gluch‡, Shafi Goldwasser‡, Frauke Kreuter†§, Omer Reingold¶, Man N. Rothblum

AuthorsFederico Danieli, Pau Rodriguez, Miguel Sarabia, Xavier Suau, Luca Zappella

AuthorsHadi Pouransari, David Grangier, C Thomas, Michael Kirchhof, Oncel Tuzel

AuthorsXianhang Li†, Chen Huang, Chun-Liang Li, Eran Malach, Josh Susskind, Vimal Thilak, Etai Littwin

AuthorsAlex Fang†**, Thomas Voice, Ruoming Pang**, Ludwig Schmidt†, Tom Gunter**

AuthorsJakub Krajewski**, Amitis Shidani, Dan Busbridge, Sam Wiseman, Jason Ramapuram

AuthorsMohammad Hossein Amani†, Aryo Lotfi†, Nicolas Mario Baldwin†, Samy Bengio, Mehrdad Farajtabar, Emmanuel Abbé*, Robert West*†

AuthorsRam Ramrakhya**, Andrew Szot, Omar Attia, Yuhao Yang, Anh Nguyen, Bogdan Mazoure, Zhe Gan, Harsh Agrawal, Alexander Toshev

AuthorsMichael Kirchhoff, Luca Füger†, Adam Goliński, Eeshan Gunesh Dhekane, Arno Blaas, Seong Joon Oh‡, Sinead Williamson

AuthorsAngie Boggust†, Donghao Ren, Yannick Assogba, Dominik Moritz, Arvind Satyanarayan†, Fred Hohman

AuthorsLars Mescheder, Wei Dong, Shiwei Li, Xuyang Bai, Marcel Santos, Peiyun Hu, Bruno Lecouat, Mingmin Zhen, Amaël Delaunoy, Tian Fang, Yanghai Tsin, Stephan R. Richter, Vladlen Koltun

AuthorsYuyang Wang, Jiarui Lu**, Navdeep Jaitly, Josh Susskind, Miguel Angel Bautista

AuthorsZitong Yang†‡, Aonan Zhang‡, Hong Liu†, Tatsunori Hashimoto†, Emmanuel Candès†, Chong Wang, Ruoming Pang

AuthorsGregor Bachmann, Yichen Jiang, Seyed Mohsen Moosavi Dezfooli, Moin Nabi

AuthorsEran Malach, Omid Saremi, Sinead Williamson, Arwen Bradley, Aryo Lotfi, Emmanuel Abbe, Josh Susskind, Etai Littwin

AuthorsPreetum Nakkiran, Arwen Bradley, Adam Goliński, Eugene Ndiaye, Michael Kirchhof, Sinead Williamson

AuthorsShruti Palaskar, Leon Gatys, Mona Abdelrahman, Mar Jacobo, Larry Lindsey, Rutika Moharir, Gunnar Lund, Yang Xu, Navid Shiee, Jeffrey Bigham, Charles Maalouf, Joseph Yitan Cheng

AuthorsJiayuan Ye, Vitaly Feldman, Kunal Talwar

AuthorsKundan Krishna, Joseph Y Cheng, Charles Maalouf, Leon A Gatys

AuthorsSzilvia Ujváry†**, Louis Béthune, Pierre Ablin, João Monteiro, Marco Cuturi, Michael Kirchhof

AuthorsBingbing Wen**, Sirajul Salekin, Feiyang Kang†, Lucy Lu Wang‡, Invoice Howe‡, Javier Movellan, Manjot Bilkhu

Narrative of Time Throughout Scales (NoTS)

Wenrui Ma (College of Pennsylvania), Ran Liu, Ellen Zippi, Chris Sandino, Juri Minxha, Behrooz Mahasseni, Erdrin Azemi, Ali Moin, Eva Dyer (College of Pennsylvania)

AuthorsSkyler Seto, Pierre Ablin, Anastasiia Filippova, Jiayuan Ye†, Louis Béthune, Angelos Katharopoulos, David Grangier

AuthorsAlec Helbling†**, Shruti Palaskar, Kundan Krishna, Polo Chau†, Leon Gatys‡, Joseph Yitan Cheng‡

AuthorsLorenzo Noci**, Gregor Bachmann, Seyed-Mohsen Moosavi-Dezfooli, Moin Nabi

Buying and selling Depth for Reminiscence: Robustifying LLMs towards Cache Constraints

Joao Monteiro, Anastasiia Filippova, David Grangier, Marco Cuturi