Thursday, February 12, 2026
Home Blog Page 28

right here’s one of the simplest ways to search out Play Retailer apps

0


Andy Walker / Android Authority

It by no means fails to amaze me how unhealthy Google is and continues to be at search. For an organization that made its bones on this class, its apps lack the power to adequately assist me discover what I’m in search of, and I can’t consider a worse instance than the Google Play Retailer.

I’ve talked about a number of of the Play Retailer’s flaws in a latest function, however its search performance — or lack thereof — continues to be one situation I can’t appear to shake. Fortunately, others have observed this drawback and developed their very own pioneering options. Lately, I found an app that makes discovering Android apps a lot simpler; it’s referred to as App Finder.

What do you consider the Play Retailer’s search performance?

2 votes

Take notes, Google: That is Play Retailer search carried out proper!

app finder android app 1

Andy Walker / Android Authority

Past providing a smattering of “sponsored” apps on the prime of the search outcomes web page and advertisements wherever there’s room, the Play Retailer’s search outcomes lack fine-tuning. You get what Google needs you to get. It’s not about looking for a selected app, however about prompting Google to counsel what it believes you need. One can perceive how this could result in loads of annoyance.

App Finder takes a distinct strategy by handing again management to the consumer. It’s a third-party search engine that reliably sniffs out Play Retailer apps primarily based on user-defined standards. It’s successfully the Superior Search instrument in Google Seek for the Play Retailer, and pulls particulars from an app’s title, package deal title, abstract, description, and even the changelog. This makes it farther-reaching than the answer baked into Google’s personal app.

After set up, you’ll discover one obtrusive situation with App Finder; it’s in no way a Materials 3 Expressive-adhering app. One of the simplest ways to explain App Finder’s UI is to say it’s aesthetically humble. Its house display closely privileges perform over kind, however I quickly realized this can be a optimistic resolution. In spite of everything, it’s a search instrument not an app magnificence pageant entrant. It’s user-customizable too, permitting me to pick out which search modifiers I would like available.

Discover apps simply utilizing filters, modifiers, and scales

If I wish to seek for a selected app, I can sort my question into the search bar — similar to the Play Retailer. Nonetheless, not like Google’s surface-level resolution, I don’t obtain a screenful of advertisements masquerading as outcomes. Regardless of this benefit, I don’t use App Finder this fashion, at the very least if I’m looking for a preferred or main app. As an alternative, its actual energy is in its search modifiers, filters, and sliding scales.

Let me present an instance. If I wish to discover all apps with the time period “music” of their title, I sort this into the search bar. Straightforward. With none filters utilized, App Finder highlights greater than 55,600 outcomes. (Hilariously, that is already extra info than the Play Retailer offers.) Nonetheless, if I transfer the Common consumer ranking to at the very least 4.7, improve the Variety of rankings slider to 100k, and toggle on the No advertisements selector, I can whittle that all the way down to 30 outcomes. This stage of management over which apps seem in your search report is just not achievable within the Play Retailer.

There are a ridiculously wide selection of filters and sliders in App Finder. Customers can management the entire variety of downloads and the in-app worth vary, or filter by launch, up to date dates, or style, and even kind by present downloads/month. That latter element is a far smarter metric for charting app recognition than the Play Retailer’s trending charts. You can too view this information by nation or see aggregated worldwide information.

app finder android app 2

Andy Walker / Android Authority

I additionally frequently use App Finder to hunt out apps which can be on sale — it’s a nifty cheat code for locating video games. When you crank up the ranking to the excessive 4s, you’ll be able to usually discover some fairly entertaining titles lurking throughout the Play Retailer. It’s equally helpful for locating new titles that simply hit the Play Retailer’s gaming class.

One way or the other, in the event you want much more management, you’ll be able to benefit from App Finder’s search operators. I can embrace a + to demand that outcomes embrace a selected phrase of their title or abstract, discarding different outcomes. Then there’s using quotes, which permits me to seek for precise, lengthier strings. OR permits me to seek for outcomes that will include a number of phrases (for example, video participant OR media participant). App Finder dedicates a whole in-app assist web page for these operators. I actually admire this little element, and I nonetheless go to this web page every now and then.

The remaining query marks

app finder android app 4

Andy Walker / Android Authority

As a lot as I like App Finder, it does have areas in want of additional polish. Sure, the UI is purposeful and intuitive, but it surely nonetheless feels decidedly dated. The outcomes display might do a greater job of highlighting related data, and textual content spacing and weight does really feel too compressed for my liking.

Then there are the event query marks. Though App Finder continues to be absolutely purposeful, it hasn’t seen an replace since October 2024. Extra worrying, the developer has famous that it’s planning “AI-based pure language search,” which makes my pores and skin crawl and would different anti-AI shoppers, too. Does App Finder want such an addition? No, I don’t imagine it does. Does something with the phrase AI in it proceed to promote? Publishers and builders certain appear to suppose so.

What’s your main drawback with the Google Play Retailer?

277 votes

Lastly, it’s price noting that App Finder does have a paid part. Though the majority of the app is obtainable to shoppers without cost (at the very least on the time of this writing), you’ll have to improve to unlock extra filters, enhanced scale ranges, and the power to export information for additional examine. Given its performance, App Finder just isn’t all that costly both. The Final tier is round $10 for a lifetime unlock, and offered that Google doesn’t make any main adjustments to the Play Retailer that disrupt App Finder’s performance, this doesn’t appear too steep.

With Google frequently delivering a sub-par search expertise on the Play Retailer, it’s turning into extra needed moderately than handy to make use of a third-party search app like App Finder to find fascinating gross sales, trending, and hidden apps. Sure, it certain has its quirks, however I can not knock the unimaginable customizability, performance, and data that App Finder offers.

Don’t wish to miss the perfect from Android Authority?

google preferred source badge light@2xgoogle preferred source badge dark@2x

Thanks for being a part of our neighborhood. Learn our Remark Coverage earlier than posting.

New triple-drug therapy stops pancreatic most cancers in its tracks, a mouse examine finds

0

A triple-drug remedy for pancreatic most cancers has proven promise in early animal checks, pointing to a possible new therapy for a illness with a notoriously low survival fee.

Thought-about one of many deadliest widespread cancers, pancreatic most cancers has a five-year relative survival fee round 13% — which means roughly 87% of individuals with the most cancers are anticipated to die inside 5 years of analysis. That survival fee can plummet as little as 1% for individuals recognized in very late levels of the illness.

20+ Gemma Venture Concepts for Freshmen (Straightforward to Construct)

0


College students study new applied sciences higher and have extra enjoyable once they could make actual issues. Gemma lets novices use concepts in actual life with out having to know rather a lot about code. These initiatives are simple, enjoyable, and meant to extend your confidence whereas educating you the way to assume creatively and remedy issues. This information offers 20+ Gemma challenge concepts for inexperienced persons, full with goals, instruments, anticipated outcomes and platform examples. Every challenge is straightforward to grasp and implement, making it excellent for college kids who wish to study by doing.

Why Is Gemma Good for Studying and Constructing Tasks?

Gemma is a superb instrument for inexperienced persons who wish to study whereas constructing actual initiatives. It simplifies complicated duties, permitting college students to concentrate on creativity and understanding as an alternative of combating troublesome programming. By utilizing Gemma, college students can shortly take a look at concepts, see on the spot outcomes and study from hands-on expertise.

This instrument additionally helps develop essential expertise equivalent to problem-solving, logical pondering, and structured planning. College students can experiment safely, make errors, and enhance their initiatives step-by-step. General, Gemma is a helpful, easy-to-use, and fascinating platform that makes studying enjoyable and encourages college students to take cost of their very own initiatives.

20+ Gemma Venture Concepts for Freshmen

1. Textual content Summarizer Software

Targets

  • Convert lengthy textual content into concise summaries
  • Enhance studying and comprehension expertise

Instruments Used

Anticipated End result

  • Enter a paragraph and get a brief, easy-to-read abstract

Platform Examples

  • Net apps, academic portals

Additionally Learn: 100 Objects College Venture Concepts: Enjoyable & Artistic for College students

2. Query Answering App

Targets

  • Present on the spot solutions to scholar questions
  • Encourage curiosity and fast studying

Instruments Used

Anticipated End result

  • Correct, clear solutions from enter textual content

Platform Examples

  • Net interface, desktop apps

3. Research Notes Generator

Targets

  • Convert lengthy chapters into bullet-style revision notes
  • Enhance reminiscence retention and studying effectivity

Instruments Used

Anticipated End result

  • Quick, readable notes for straightforward revision

Platform Examples

  • Net dashboards, cell apps

4. Grammar Correction Software

Targets

  • Determine grammar errors in textual content
  • Enhance sentence readability and writing expertise

Instruments Used

Anticipated End result

  • Corrected sentences prepared for assignments

Platform Examples

  • On-line editors, phrase processors

5. Essay Matter Generator

Targets

  • Generate concepts for essays and writing duties
  • Encourage creativity and structured pondering

Instruments Used

Anticipated End result

  • Related essay matters for college kids

Platform Examples

  • Instructional web sites, writing apps

6. Chat-Fashion Studying Assistant

Targets

  • Present interactive educational assist
  • Reply widespread questions in actual time

Instruments Used

Anticipated End result

  • Pleasant, useful responses for studying matters

Platform Examples

  • Browser apps, academic portals

7. Resume Content material Helper

Targets

  • Help in creating skilled resume content material
  • Train the way to summarize expertise and experiences

Instruments Used

Anticipated End result

  • Clear, structured resume sections

Platform Examples

  • Profession web sites, doc editors

8. E mail Writing Assistant

Targets

  • Assist write skilled emails
  • Enhance formal communication expertise

Instruments Used

Anticipated End result

  • Polished, ready-to-send e-mail drafts

Platform Examples

  • Net functions, scholar portals

9. Vocabulary Builder Software

Targets

  • Enhance phrase information and utilization
  • Improve studying and writing expertise

Instruments Used

Anticipated End result

  • Every day phrases with examples and meanings

Platform Examples

  • Cellular apps, internet dashboards

10. FAQ Generator

Targets

  1. Generate FAQs from enter textual content
  2. Assist customers perceive content material shortly

Instruments Used

Anticipated End result

  • Record of clear, related questions and solutions

Platform Examples

  • Instructional web sites, scholar portals

11. Story Writing Assistant

Targets

  • Information inexperienced persons to jot down inventive tales
  • Enhance writing expertise and creativeness

Instruments Used

Anticipated End result

  • Quick tales which might be well-structured and readable

Platform Examples

  • Writing apps, on-line platforms

12. Lesson Plan Generator

Targets

  • Create easy lesson plans for any subject
  • Assist college students and academics manage content material

Instruments Used

Anticipated End result

  • Clear, step-by-step lesson outlines

Platform Examples

  • Instructional portals, classroom apps

13. Code Clarification Software

Targets

  • Clarify small code snippets in plain language
  • Assist inexperienced persons perceive programming logic

Instruments Used

Anticipated End result

  • Straightforward-to-read explanations for studying functions

Platform Examples

  • Coding web sites, scholar portals

14. Quiz Query Generator

Targets

  • Routinely create multiple-choice or short-answer questions
  • Take a look at information of any subject

Instruments Used

Anticipated End result

  • Prepared-to-use quiz questions

Platform Examples

  • On-line quiz platforms, academic instruments

15. Every day Studying Tip Generator

Targets

  • Present helpful each day research or productiveness suggestions
  • Encourage habit-building and self-learning

Instruments Used

Anticipated End result

  • Quick, useful suggestions delivered each day

Platform Examples

  • Cellular dashboards, scholar apps

16. Studying Comprehension Helper

Targets

  • Simplify troublesome studying passages
  • Enhance understanding and retention

Instruments Used

Anticipated End result

  • Straightforward explanations for higher comprehension

Platform Examples

  • Instructional apps, internet platforms

17. Weblog Define Generator

Targets

  • Create structured outlines for blogs or assignments
  • Assist college students plan content material effectively

Instruments Used

Anticipated End result

  • Clear headings, subheadings and factors

Platform Examples

  • Writing platforms, content material web sites

18. Interview Query Observe Software

Targets

  • Present apply questions for interviews or oral exams
  • Construct confidence in responding

Instruments Used

Anticipated End result

  • Structured questions with advised solutions

Platform Examples

  • Profession prep web sites, apps

19. Analysis Matter Finder

Targets

  • Recommend attention-grabbing analysis matters
  • Encourage educational exploration

Instruments Used

Anticipated End result

  • Shortlist of appropriate matters with summaries

Platform Examples

  • Educational portals, scholar dashboards

20. Suggestions Writing Assistant

Targets

  • Assist college students write skilled suggestions
  • Enhance communication and reflection expertise

Instruments Used

Anticipated End result

  • Polished, structured suggestions messages

Platform Examples

  • Instructional platforms, office instruments

21. Easy Translation Assist Software

Targets

  • Translate the quick textual content precisely
  • Assist college students perceive content material in several languages

Instruments Used

Anticipated End result

  • Correct translations prepared to make use of

Platform Examples

Why Select Gemma for Scholar Tasks?

Gemma is a beginner-friendly instrument that makes studying and constructing initiatives simpler for college kids. It permits college students to create sensible functions while not having superior coding expertise. With Gemma, college students can experiment, take a look at concepts, and see outcomes shortly, which boosts confidence and encourages hands-on studying.

Utilizing Gemma additionally helps college students perceive essential ideas like logic, construction, and problem-solving. It offers college students a simple place to concentrate on studying and being inventive as an alternative of coping with sophisticated installations or programming errors. General, Gemma is a superb alternative for college kids who wish to study by doing and enhance each their technical and analytical expertise.

Why Freshmen Ought to Construct Gemma Tasks

  • Learn the way trendy AI instruments work.
  • Apply ideas virtually.
  • Enhance problem-solving expertise.
  • Construct confidence in coding and writing.
  • Put together for superior initiatives sooner or later.

Conclusion

Gemma initiatives present inexperienced persons with a simple and sensible solution to study and create. These 20+ Gemma challenge concepts concentrate on simplicity, creativity and real-world software. By constructing these initiatives, college students can strengthen problem-solving expertise, logical pondering, and confidence. Engaged on initiatives teaches sensible expertise and reinforces ideas higher than memorization. Freshmen can begin small, experiment safely and progressively tackle larger challenges. General, Gemma is a good platform to show studying into doing, getting ready college students for future educational and technical development.

Programming an estimation command in Stata: Consolidating your code

0


(
newcommand{xb}{{bf x}}
newcommand{gb}{{bf g}}
newcommand{Hb}{{bf H}}
newcommand{Gb}{{bf G}}
newcommand{Eb}{{bf E}}
newcommand{betab}{boldsymbol{beta}}
)I write ado-commands that estimate the parameters of an exponential conditional imply (ECM) mannequin and a probit conditional imply (PCM) mannequin by nonlinear least squares, utilizing the strategies that I mentioned within the publish Programming an estimation command in Stata: Nonlinear least-squares estimators. These instructions will both share numerous code or repeat numerous code, as a result of they’re so related. It’s virtually at all times higher to share code than to repeat code. Shared code solely must be modified in a single place so as to add a characteristic or to repair an issue; repeated code have to be modified in every single place. I introduce Mata libraries to share Mata features throughout ado-commands, and I introduce wrapper instructions to share ado-code.

That is the twenty seventh publish within the sequence Programming an estimation command in Stata. I like to recommend that you simply begin at the start. See Programming an estimation command in Stata: A map to posted entries for a map to all of the posts on this sequence.

Ado-commands for ECM and PCM fashions

I now convert the examples of NLS for ECM and PCM fashions mentioned in Programming an estimation command in Stata: Nonlinear least-squares estimators to ado-commands. To maintain issues easy, the ado-commands mentioned right here omit many customary options comparable to issue variables, time-series variables, strong estimators of the VCE, and cluster–strong estimators of the VCE.

mynlexp1 implements an NLS estimator for the parameters of the ECM mannequin.

Code block 1: mynlexp1.ado


*! model 1.0.0  09May2016
program outline mynlexp1, eclass sortpreserve
    model 14.1

    syntax varlist [if] [in] [,  noCONStant ]
    marksample touse

    gettoken depvar indeps : varlist

    tempname b V N rank

    mata: mywork("`depvar'", "`indeps'", "`touse'", "`fixed'", ///
       "`b'", "`V'", "`N'", "`rank'" )

    if "`fixed'" == "" {
        native indeps "`indeps' _cons"
    }
    matrix colnames `b' = `indeps'
    matrix colnames `V' = `indeps'
    matrix rownames `V' = `indeps'

    ereturn publish `b' `V', esample(`touse')
    ereturn scalar N       = `N'
    ereturn scalar rank    = `rank'
    ereturn native  cmd     "mynlexp"

    ereturn show

finish

mata:

void MYNLExp(actual scalar todo, actual vector b,   ///
        actual vector y, actual matrix X,           ///
        val, grad, hess)
{
        actual vector  r, f, xb
        actual matrix  df

        xb  = X*b'
        f   = exp(xb)
        r   = y - f
        val = -(r:^2)
        df  = f:*X

        if (todo>=1) {
                grad = r:*df
        }
        if (todo==2) {
                hess = -1*quadcross(df, df)
        }
}

void mywork( string scalar depvar,  string scalar indeps,
             string scalar touse,   string scalar fixed,
             string scalar bname,   string scalar Vname,
             string scalar nname,   string scalar rname)
{
    actual vector y, b
    actual matrix X, V
    actual scalar n, p, ssr
    transmorphic S

    y = st_data(., depvar, touse)
    n = rows(y)
    X = st_data(., indeps, touse)
    if (fixed == "") {
        X = X,J(n, 1, 1)
    }
    p = cols(X)

    S = optimize_init()
    optimize_init_argument(S, 1, y)
    optimize_init_argument(S, 2, X)
    optimize_init_evaluator(S, &MYNLExp())
    optimize_init_params(S, J(1, p, .01))
    optimize_init_evaluatortype(S, "gf2")
    optimize_init_conv_vtol(S, 1e-10)
    b   = optimize(S)
    V   = invsym(-1*optimize_result_Hessian(S))
    ssr = (-1/(n-p))*optimize_result_value(S)
    V   = ssr*V

    st_matrix(bname, b)
    st_matrix(Vname, V)
    st_numscalar(nname, n)
    st_numscalar(rname, p)

}
finish

Traces 2–29 outline the ado-command, which makes use of the Mata work operate mywork() outlined on strains 54–89. Traces 33–52 outline the evaluator operate MYNLExp() utilized by optimize() in mywork(). This construction needs to be acquainted from the Poisson regression examples beforehand
mentioned.

mynlprobit1 implements an NLS estimator for the parameters of the PCM mannequin.

Code block 2: mynlprobit1.ado


*! model 1.0.0  09May2016
program outline mynlprobit1, eclass sortpreserve
    model 14.1

    syntax varlist [if] [in] [,  noCONStant ]
    marksample touse

    gettoken depvar indeps : varlist

    tempname b V N rank

    mata: mywork("`depvar'", "`indeps'", "`touse'", "`fixed'", ///
       "`b'", "`V'", "`N'", "`rank'" )

    if "`fixed'" == "" {
        native indeps "`indeps' _cons"
    }
    matrix colnames `b' = `indeps'
    matrix colnames `V' = `indeps'
    matrix rownames `V' = `indeps'

    ereturn publish `b' `V', esample(`touse')
    ereturn scalar N       = `N'
    ereturn scalar rank    = `rank'
    ereturn native  cmd     "mynlexp"

    ereturn show

finish

mata:

void MYNLProbit(actual scalar todo, actual vector b,  ///
        actual vector y, actual matrix X,             ///
        val, grad, hess)
{
        actual vector  r, f, xb
        actual matrix  df

        xb  = X*b'
        f   = regular(xb)
        r   = y - f
        val = -(r:^2)
        df  = normalden(xb):*X

        if (todo>=1) {
                grad = r:*df
        }
        if (todo==2) {
                hess = -1*quadcross(df, df)
        }
}

void mywork( string scalar depvar,  string scalar indeps,
             string scalar touse,   string scalar fixed,
             string scalar bname,   string scalar Vname,
             string scalar nname,   string scalar rname)
{

    actual vector y, b
    actual matrix X, V
    actual scalar n, p, ssr
    transmorphic S

    y = st_data(., depvar, touse)
    n = rows(y)
    X = st_data(., indeps, touse)
    if (fixed == "") {
        X = X,J(n, 1, 1)
    }
    p = cols(X)

    S = optimize_init()
    optimize_init_argument(S, 1, y)
    optimize_init_argument(S, 2, X)
    optimize_init_evaluator(S, &MYNLProbit())
    optimize_init_params(S, J(1, p, .01))
    optimize_init_evaluatortype(S, "gf2")
    optimize_init_conv_vtol(S, 1e-10)
    b   = optimize(S)
    V   = invsym(-1*optimize_result_Hessian(S))
    ssr = (-1/(n-p))*optimize_result_value(S)
    V   = ssr*V

    st_matrix(bname, b)
    st_matrix(Vname, V)
    st_numscalar(nname, n)
    st_numscalar(rname, p)
}
finish

The code for mynlprobit1 is almost similar to that for mynlexp1.

Duplicated code is harmful. Anytime you need to add a characteristic or repair an issue, you should do it twice. I extremely suggest that you simply keep away from duplicated code, and I illustrate how by rewriting these instructions to have a single Mata code base after which a single ado-code base.

Libraries of Mata code

The mywork() features utilized in mynlexp1 and mynlprobit1 differ solely within the evaluator operate they name; see line 76 in code blocks 1 and a pair of. I want to have one mywork() operate that is named by mynlexp1 and by mynlprobit1.

Mata features outlined on the backside of an ado-file are native to that ado-file, so I can not use this technique for outlining the mywork() operate that will likely be utilized by each mynlexp1 and mynlprobit1. What I want is a file containing compiled Mata features which are callable from another Mata operate or from inside any ado-file or do-file. Any such file is called a library.

I take advantage of mynllib.mata in code block 3 to make the library lmynllib.mlib containing the compiled Mata features MYNLWork(), MYNLProbit(), MYNLExp().

Code block 3: mynllib.mata


mata:
mata clear

void MYNLExp(actual scalar todo, actual vector b,   ///
        actual vector y, actual matrix X,           ///
        val, grad, hess)
{
        actual vector  r, f
        actual matrix  df

        f   = exp(X*b')
        r   = y - f
        val = -(r:^2)
        df  = f:*X

        if (todo>=1) {
                grad = r:*df
        }
        if (todo==2) {
                hess = -1*quadcross(df, df)
        }
}

void MYNLProbit(actual scalar todo, actual vector b,  ///
        actual vector y, actual matrix X,             ///
        val, grad, hess)
{
        actual vector  r, f, xb
        actual matrix  df

        xb  = X*b'
        f   = regular(xb)
        r   = y - f
        val = -(r:^2)
        df  = normalden(xb):*X

        if (todo>=1) {
                grad = r:*df
        }
        if (todo==2) {
                hess = -1*quadcross(df, df)
        }
}

void MYNLWork( string scalar depvar,  string scalar indeps,
             string scalar touse,   string scalar fixed,
             string scalar bname,   string scalar Vname,
             string scalar nname,   string scalar rname,
             string scalar mannequin)
{

    actual vector       y, b
    actual matrix       X, V
    actual scalar       n, p, ssr
    string scalar     emsg
    pointer(operate) f
    transmorphic      S

    if (mannequin=="expm") {
        f = &MYNLExp()
    }
    else if (mannequin=="probit") {
        f = &MYNLProbit()
    }
    else {
        emsg = "{pink}mannequin " + mannequin + " invalidn"
        printf(emsg)
        exit(error(498))
    }
    y = st_data(., depvar, touse)
    n = rows(y)
    X = st_data(., indeps, touse)
    if (fixed == "") {
        X = X,J(n, 1, 1)
    }
    p = cols(X)

    S = optimize_init()
    optimize_init_argument(S, 1, y)
    optimize_init_argument(S, 2, X)
    optimize_init_evaluator(S, f)
    optimize_init_params(S, J(1, p, .01))
    optimize_init_evaluatortype(S, "gf2")
    optimize_init_conv_vtol(S, 1e-10)
    b   = optimize(S)
    V   = invsym(-1*optimize_result_Hessian(S))
    ssr = (-1/(n-p))*optimize_result_value(S)
    V   = ssr*V

    st_matrix(bname, b)
    st_matrix(Vname, V)
    st_numscalar(nname, n)
    st_numscalar(rname, p)
}

mata mlib create lmynllib, substitute
mata mlib add    lmynllib MYNLWork() MYNLProbit() MYNLExp()

finish

Traces 1 and 99 open and shut the Mata session by which I outline the features, create the library, and add the features to the library. Traces 4–22 outline MYNLExp(), which I’ve already mentioned. Traces 24–43 outline MYNLProbit(), which I’ve already mentioned. Traces 45–94 outline MYNLWork(), which is the work operate that each mynlexp2.ado and mynlprobit2.ado will use. Observe that I’ve used uppercase letters within the names MYNLExp(), MYNLProbit(), MYNLWork(). Features in Mata libraries are world: they are often known as from wherever, and their names have to be distinctive within the house of names for Mata features. I attempt to keep away from utilizing operate names that different programmers may use by prefixing the names of my features with an uppercase title of the library and starting the operate title with an uppercase letter.

Line 49 specifies that the ninth argument MYNLWork() is a string scalar referred to as mannequin contained in the operate. The ado-commands will go both “expm” or “probit” on this argument.

If mannequin accommodates “expm”, line 60 shops the handle of the operate MYNLExp() in f. The variable sort that holds the handle of a operate is called a pointer to a operate. Because of this, line 56 declares f to a pointer to a operate. If mannequin accommodates “probit”, line 63 shops the handle of the operate MYNLProbit() in f. If mannequin doesn’t include “expm” or “probit”, strains 66–68 show an error message and exit.

Pointers maintain the handle of an object. All I want here’s a field that holds the handle of the evaluator operate comparable to the mannequin match by the ado-command that may name MYNLWork(). Line 56 declares f to be this kind of field, strains 60 and 63 retailer the reminiscence of the right operate in f, and line 81 places the handle saved in f into the optimize object S. Kind assist M-2 pointers to study extra about pointers.

The remaining strains of MYNLWork() are the identical because the strains within the mywork() features in mynlexp1 and mynlprobit1.

There may be nonetheless numerous duplicated code within the evaluator features MYNLExp() and MYNLProbit(). As a substitute of utilizing a pointer to the evaluator operate, I may have consolidated the evaluator features MYNLExp() and MYNLProbit() right into a single operate that used a further argument to determine which case to judge. I selected the introduced technique as a result of it’s sooner. Consolidating the evaluator features would have slowed down the operate that I most need to pace up. (The evaluator operate is named many instances by optimize().) So, on this case, I accepted the danger of duplicated code for the benefit of pace.

Line 96 creates the Mata library lmynllib.mlib within the present listing, changing any beforehand outlined model of this library. Line 97 places the compiled variations of MYNLWork(), MYNLProbit(), and MYNLExp() into lmynllib.mnlib. At this level, the file lmynllib.mlib within the present listing accommodates the compiled features MYNLWork(), MYNLProbit(), and MYNLExp().

mynllib.mata is a do-file that makes a Mata library, therefore the .mata suffix as an alternative of the .do suffix. I can execute it by typing do mynllib.mata. Instance 1 makes the library lmynllib.mlib.

Instance 1: Making a Mata library


. program drop _all

. mata: mata clear

. quietly do mynllib.mata

. mata: mata mlib index
.mlib libraries to be searched are actually
    lmatabase;lmatapss;lmataado;lmatapostest;lmatafc;lmatasem;lmatapath;
> lmatamcmc;lmatagsem;lmataopt;lmynllib;lfreduse;lpoparms;lspmat

After dropping all of the ado-commands in reminiscence, and clearing Mata, I used quietly do mynllib.mata to make the library, as a result of I don’t need to see the code once more. mata: mata mlib index updates the listing libraries identified to Mata; this step provides lmynllib.mlib to the listing of identified libraries in order that I can use the features therein outlined.

Having made lmynllib.mlib and added it to the listing of identified libraries, I can use the features therein outlined in one-line Mata calls in my ado-commands. Think about mynlexp2.

Code block 4: mynlexp2.ado


*! model 2.0.0  10May2016
program outline mynlexp2, eclass sortpreserve
    model 14.1

    syntax varlist [if] [in] [,  noCONStant ]
    marksample touse

    gettoken depvar indeps : varlist

    tempname b V N rank

    mata: MYNLWork("`depvar'", "`indeps'", "`touse'", "`fixed'", ///
       "`b'", "`V'", "`N'", "`rank'", "expm" )

    if "`fixed'" == "" {
        native indeps "`indeps' _cons"
    }
    matrix colnames `b' = `indeps'
    matrix colnames `V' = `indeps'
    matrix rownames `V' = `indeps'

    ereturn publish `b' `V', esample(`touse')
    ereturn scalar N       = `N'
    ereturn scalar rank    = `rank'
    ereturn native  cmd     "mynlexp"

    ereturn show

finish

The code for mynlexp2 is sort of the identical because the ado-code for mynlexp1 in code block 1. The one variations are that line 12 calls MYNLWork() as an alternative of mywork() and that MYNLWork() accepts a ninth argument, specified on line 13 to be “expm”.

Now take into account mynlprobit2 in code block 5.

Code block 5: mynlprobit2.ado


*! model 2.0.0  10May2016
program outline mynlprobit2, eclass sortpreserve
    model 14.1

    syntax varlist [if] [in] [,  noCONStant ]
    marksample touse

    gettoken depvar indeps : varlist

    tempname b V N rank

    mata: MYNLWork("`depvar'", "`indeps'", "`touse'", "`fixed'", ///
       "`b'", "`V'", "`N'", "`rank'", "probit" )

    if "`fixed'" == "" {
        native indeps "`indeps' _cons"
    }
    matrix colnames `b' = `indeps'
    matrix colnames `V' = `indeps'
    matrix rownames `V' = `indeps'

    ereturn publish `b' `V', esample(`touse')
    ereturn scalar N       = `N'
    ereturn scalar rank    = `rank'
    ereturn native  cmd     "mynlprobit"

    ereturn show

finish

The analogous modifications are made to mynlprobit2 that have been made to mynlexp2. Particularly, notice that line 13 passes “probit” within the ninth argument to MYNLWork().

Writing a piece ado-command

The Mata library allowed me to consolidate the duplicated Mata code. I nonetheless have numerous duplicated ado-code. To consolidate the duplicated ado-code, I’ve mynlexp3 and mynlprobit3 name a single ado-command that does the work as seen in code blocks 6 and seven.

Code block 6: mynlexp3.ado


*! model 3.0.0  11May2016
program outline mynlexp3
    model 14.1

    mynlwork expm `0'

finish

Code block 7: mynlprobit3.ado


*! model 3.0.0  11May2016
program outline mynlprobit3, eclass sortpreserve
    model 14.1

    mynlwork probit `0'

finish

Recall that regardless of the person specified is contained within the native macro 0. Line 5 of mynlexp3 passes regardless of the person specified prefixed with “expm” to mynlwork. Equally, line 5 of mynlprobit3 passes regardless of the person specified prefixed with “probit” to mynlwork. mynlexp3 and mynlprobit3 are referred to as wrapper instructions, as a result of they only wrap calls to mynlwork, which does the precise work.

Code block 8 accommodates the code for mynlwork.

Code block 8: mynlwork.ado


*! model 1.0.0  11May2016
program outline mynlwork, eclass sortpreserve
    model 14.1

    gettoken mannequin 0 : 0

    if "`mannequin'" == "expm" {
        native cname "mynlexp"
    }
    else if "`mannequin'" == "probit" {
        native cname "mynlprobit"
    }
    else {
        diplay "{pink}mannequin `mannequin' invalid"
        exit 498
    }

    syntax varlist [if] [in] [,  noCONStant ]
    marksample touse

    gettoken depvar indeps : varlist

    tempname b V N rank

    mata: MYNLWork("`depvar'", "`indeps'", "`touse'", "`fixed'", ///
       "`b'", "`V'", "`N'", "`rank'", "`mannequin'" )

    if "`fixed'" == "" {
        native indeps "`indeps' _cons"
    }
    matrix colnames `b' = `indeps'
    matrix colnames `V' = `indeps'
    matrix rownames `V' = `indeps'

    ereturn publish `b' `V', esample(`touse')
    ereturn scalar N       = `N'
    ereturn scalar rank    = `rank'
    ereturn native  cmd     "`cname'"

    ereturn show

finish

Line 5 makes use of gettoken to place the mannequin prefixed to the person enter by mynlexp3 or mynlprobit3 within the native macro mannequin and to place regardless of the person specified within the native macro 0. Traces 7–16 put the title of the calling command within the native macro cname or exit with an error message, if the mannequin will not be acknowledged. In principle, the error case dealt with in strains 13–16 will not be essential, as a result of I ought to know tips on how to name my very own command. Expertise has taught me that dealing with these additional error instances makes altering the code sooner or later a lot simpler, so I take into account this good observe.

By line 17, the native macro mannequin and the native macro cname include all that differs between the instances dealt with by mynlexp3 and mynlprobit3. mannequin is handed to MYNLWork() on line 26, and cname is used to retailer the command title in e(cmd) on line 38.

Examples 2 and three illustrate that mynlexp3 and mynlprobit3 produce output comparable to the outcomes produced by nl in examples 2 and 4 in Programming an estimation command in Stata: Nonlinear least-squares estimators.

Instance 2: mynlexp3 output


. mynlexp3 accidents cvalue tickets
Iteration 0:   f(p) =  -2530.846
Iteration 1:   f(p) = -1116.4901
Iteration 2:   f(p) = -248.56923
Iteration 3:   f(p) = -225.91644
Iteration 4:   f(p) = -225.89573
Iteration 5:   f(p) = -225.89573
Iteration 6:   f(p) = -225.89573
----------------------------------------------------------------------------
           |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------+----------------------------------------------------------------
    cvalue |   .1759434   .0323911     5.43   0.000     .1124581    .2394287
   tickets |   1.447672   .0333599    43.40   0.000     1.382287    1.513056
     _cons |  -7.660608   .2355725   -32.52   0.000    -8.122322   -7.198894
----------------------------------------------------------------------------

Instance 3: mynlprobit3 output


. mynlprobit3 hadaccident cvalue tickets
Iteration 0:   f(p) = -132.90997
Iteration 1:   f(p) = -16.917203
Iteration 2:   f(p) = -10.995001
Iteration 3:   f(p) = -10.437501
Iteration 4:   f(p) = -10.427738
Iteration 5:   f(p) = -10.427156
Iteration 6:   f(p) = -10.427123
Iteration 7:   f(p) = -10.427121
Iteration 8:   f(p) =  -10.42712
Iteration 9:   f(p) =  -10.42712
Iteration 10:  f(p) =  -10.42712
----------------------------------------------------------------------------
           |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------+----------------------------------------------------------------
    cvalue |   .3616322   .0918214     3.94   0.000     .1816656    .5415988
   tickets |   2.177509   .1974173    11.03   0.000     1.790578     2.56444
     _cons |  -10.95166    1.05565   -10.37   0.000    -13.02069   -8.882622
----------------------------------------------------------------------------

Completed and undone

It’s virtually at all times higher to share code than to repeat code. Shared code solely must be modified in a single place so as to add a characteristic or to repair an issue; repeated code have to be modified in every single place. I launched Mata libraries to share Mata features throughout ado-commands, and I launched wrapper instructions to share ado-code.



The philosophical puzzle of rational synthetic intelligence | MIT Information

0

To what extent can a man-made system be rational?

A brand new MIT course, 6.S044/24.S00 (AI and Rationality), doesn’t search to reply this query. As an alternative, it challenges college students to discover this and different philosophical issues by the lens of AI analysis. For the following technology of students, ideas of rationality and company may show integral in AI decision-making, particularly when influenced by how people perceive their very own cognitive limits and their constrained, subjective views of what’s or isn’t rational.

This inquiry is rooted in a deep relationship between laptop science and philosophy, which have lengthy collaborated in formalizing what it’s to type rational beliefs, be taught from expertise, and make rational selections in pursuit of 1’s objectives.

“You’d think about laptop science and philosophy are fairly far aside, however they’ve at all times intersected. The technical components of philosophy actually overlap with AI, particularly early AI,” says course teacher Leslie Kaelbling, the Panasonic Professor of Pc Science and Engineering at MIT, calling to thoughts Alan Turing, who was each a pc scientist and a thinker. Kaelbling herself holds an undergraduate diploma in philosophy from Stanford College, noting that laptop science wasn’t obtainable as a serious on the time.

Brian Hedden, a professor within the Division of Linguistics and Philosophy, holding an MIT Schwarzman Faculty of Computing shared place with the Division of Electrical Engineering and Pc Science (EECS), who teaches the category with Kaelbling, notes that the 2 disciplines are extra aligned than individuals may think, including that the “variations are in emphasis and perspective.”

Instruments for additional theoretical thinking

Provided for the primary time in fall 2025, Kaelbling and Hedden created AI and Rationality as a part of the Widespread Floor for Computing Schooling, a cross-cutting initiative of the MIT Schwarzman Faculty of Computing that brings a number of departments collectively to develop and train new programs and launch new applications that mix computing with different disciplines.

With over two dozen college students registered, AI and Rationality is certainly one of two Widespread Floor lessons with a basis in philosophy, the opposite being 6.C40/24.C40 (Ethics of Computing).

Whereas Ethics of Computing explores issues concerning the societal impacts of quickly advancing expertise, AI and Rationality examines the disputed definition of rationality by contemplating a number of elements: the character of rational company, the idea of a completely autonomous and clever agent, and the ascription of beliefs and needs onto these methods.

As a result of AI is extraordinarily broad in its implementation and every use case raises completely different points, Kaelbling and Hedden brainstormed subjects that might present fruitful dialogue and engagement between the 2 views of laptop science and philosophy.

“It is necessary once I work with college students learning machine studying or robotics that they step again a bit and study the assumptions they’re making,” Kaelbling says. “Interested by issues from a philosophical perspective helps individuals again up and perceive higher learn how to situate their work in precise context.”

Each instructors stress that this isn’t a course that gives concrete solutions to questions on what it means to engineer a rational agent.

Hedden says, “I see the course as constructing their foundations. We’re not giving them a physique of doctrine to be taught and memorize after which apply. We’re equipping them with instruments to consider issues in a essential method as they exit into their chosen careers, whether or not they’re in analysis or business or authorities.”

The fast progress of AI additionally presents a brand new set of challenges in academia. Predicting what college students might must know 5 years from now could be one thing Kaelbling sees as an not possible job. “What we have to do is give them the instruments at a better stage — the habits of thoughts, the methods of considering — that may assist them method the stuff that we actually can’t anticipate proper now,” she says.

Mixing disciplines and questioning assumptions

Up to now, the category has drawn college students from a variety of disciplines — from these firmly grounded in computing to others concerned with exploring how AI intersects with their very own fields of examine.

All through the semester’s studying and discussions, college students grappled with completely different definitions of rationality and the way they pushed again towards assumptions of their fields.

On what shocked her concerning the course, Amanda Paredes Rioboo, a senior in EECS, says, “We’re sort of taught that math and logic are this golden commonplace or fact. This class confirmed us a wide range of examples that people act inconsistently with these mathematical and logical frameworks. We opened up this entire can of worms as as to if, is it people which are irrational? Is it the machine studying methods that we designed which are irrational? Is it math and logic itself?”

Junior Okoroafor, a PhD scholar within the Division of Mind and Cognitive Sciences, was appreciative of the category’s challenges and the methods wherein the definition of a rational agent may change relying on the self-discipline. “Representing what every area means by rationality in a proper framework, makes it clear precisely which assumptions are to be shared, and which had been completely different, throughout fields.”

The co-teaching, collaborative construction of the course, as with all Widespread Floor endeavors, gave college students and the instructors alternatives to listen to completely different views in real-time.

For Paredes Rioboo, that is her third Widespread Floor course. She says, “I actually just like the interdisciplinary facet. They’ve at all times felt like a pleasant mixture of theoretical and utilized from the truth that they should lower throughout fields.”

Based on Okoroafor, Kaelbling and Hedden demonstrated an apparent synergy between fields, saying that it felt as in the event that they had been participating and studying together with the category. How laptop science and philosophy can be utilized to tell one another allowed him to know their commonality and invaluable views on intersecting points.

He provides, “philosophy additionally has a method of peculiar you.”

Who earnings from AI? Not OpenAI, says assume tank

0

To reply query one, researchers created a case research they known as the GPT-5 bundle, which they mentioned included all of OpenAI’s choices out there throughout GPT-5’s lifetime because the flagship mannequin, together with GPT-5 and GPT-5.1, GPT-4o, ChatGPT, and the API, and estimated the income from and prices of working the bundle. All numbers gathered have been primarily based on sources of knowledge that included claims by OpenAI and its employees, and reporting by media shops, primarily The Data, CNBC, and the Wall Road Journal.

The income estimate, they mentioned, “is comparatively easy”. For the reason that bundle included all of OpenAI’s fashions, it was the corporate’s whole income over GPT-5’s lifetime from August to December final yr: $6.1 billion.

And, they identified, “at first look, $6.1 billion sounds wholesome, till you juxtapose it with the prices of working the GPT-5 bundle.” These prices come from 4 predominant sources, the report mentioned, the primary of which is inference compute at a price of $3.2 billion. That quantity is predicated on public estimates of OpenAI’s whole inference compute spend in 2025, and assumes that the allocation of compute throughout GPT-5’s tenure was proportional to the fraction of the yr’s income generated in that interval.

Parallelized sampling utilizing exponential variates


As a part of our current work to assist weighted sampling of Spark knowledge frames in sparklyr, we launched into a journey trying to find algorithms that may carry out weighted sampling, particularly sampling with out substitute, in environment friendly and scalable methods inside a distributed cluster-computing framework, reminiscent of Apache Spark.

Within the curiosity of brevity, “weighted sampling with out substitute” shall be shortened into SWoR for the rest of this weblog put up.

Within the following sections, we are going to clarify and illustrate what SWoR means probability-wise, briefly define some various options we now have thought of however weren’t fully happy with, after which deep-dive into exponential variates, a easy mathematical assemble that made the best resolution for this downside attainable.

When you can’t wait to leap into motion, there may be additionally a part by which we showcase instance usages of sdf_weighted_sample() in sparklyr. As well as, you possibly can study the implementation element of sparklyr::sdf_weighted_sample() on this pull request.

How it began

Our journey began from a Github difficulty inquiring about the potential of supporting the equal of dplyr::sample_frac(..., weight = ) for Spark knowledge frames in sparklyr. For instance,

dplyr::sample_frac(mtcars, 0.25, weight = gear, exchange = FALSE)
##                    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Merc 280C         17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Chrysler Imperial 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat X1-9         27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Porsche 914-2     26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Maserati Bora     15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Ferrari Dino      19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6

will randomly choose one-fourth of all rows from a R knowledge body named “mtcars” with out substitute, utilizing mtcars$gear as weights. We have been unable to search out any operate implementing the weighted variations of dplyr::sample_frac amongst Spark SQL built-in features in Spark 3.0 or in earlier variations, which suggests a future model of sparklyr might want to run its personal weighted sampling algorithm to assist such use instances.

What precisely is SWoR

The aim of this part is to mathematically describe the likelihood distribution generated by SWoR by way of (w_1, dotsc, w_N), in order that readers can clearly see that the exponential-variate primarily based algorithm offered in a subsequent part in truth samples from exactly the identical likelihood distribution. Readers already having a crystal-clear psychological image of what SWoR entails ought to most likely skip most of this part. The important thing take-away right here is given (N) rows (r_1, dotsc, r_N) and their weights (w_1, dotsc, w_N) and a desired pattern dimension (n), the likelihood of SWoR deciding on ((r_1, dotsc, r_n)) is (prodlimits_{j = 1}^{n} left( {w_j} center/ {sumlimits_{okay = j}^{N}{w_k}} proper)).

SWOR is conceptually equal to a (n)-step course of of choosing 1 out of ((n – j + 1)) remaining rows within the (j)-th step for (j in {1, dotsc, n}), with every remaining row’s probability of getting chosen being linearly proportional to its weight in any of the steps, i.e.,

samples := {}
inhabitants := {r[1], ..., r[N]}

for j = 1 to n
  choose r[x] from inhabitants with likelihood
    (w[x] / TotalWeight(inhabitants))
  samples := samples + {r[x]}
  inhabitants := inhabitants - {r[x]}

Discover the end result of a SWoR course of is in truth order-significant, which is why on this put up it would at all times be represented as an ordered tuple of parts.

Intuitively, SWoR is analogous to throwing darts at a bunch of tiles. For instance, let’s say the scale of our pattern house is 5:

  • Think about (r_1, r_2, dotsc, r_5) as 5 rectangular tiles laid out contiguously on a wall with widths (w_1, w_2, dotsc, w_5), with (r_1) masking ([0, w_1)), (r_2) covering ([w_1, w_1 + w_2)), …, and (r_5) covering (left[sumlimits_{j = 1}^{4} w_j, sumlimits_{j = 1}^{5} w_jright))

  • Equate drawing a random sample in each step to throwing a dart uniformly randomly within the interval covered by all tiles that are not hit yet

  • After a tile is hit, it gets taken out and remaining tiles are re-arranged so that they continue to cover a contiguous interval without overlapping

If our sample size is 3, then we shall ask ourselves what is the probability of the dart hitting ((r_1, r_2, r_3)) in that order?

In step (j = 1), the dart will hit (r_1) with probability (left. w_1 middle/ left(sumlimits_{k = 1}^{N}w_kright) right.)

.

After deleting (r_1) from the sample space after it’s hit, step (j = 2) will look like this:

step 2 ,

and the probability of the dart hitting (r_2) in step 2 is (left. w_2 middle/ left(sumlimits_{k = 2}^{N}w_kright) right.) .

Finally, moving on to step (j = 3), we have:

step 3 ,

with the probability of the dart hitting (r_3) being (left. w_3 middle/ left(sumlimits_{k = 3}^{N}w_kright) right.).

So, combining all of the above, the overall probability of selecting ((r_1, r_2, r_3)) is (prodlimits_{j = 1}^{3} left( {w_j} middle/ {sumlimits_{k = j}^{N}{w_k}} right)).

Naive approaches for implementing SWoR

This section outlines some possible approaches that were briefly under consideration. Because none of these approaches scales well to a large number of rows or a non-trivial number of partitions in a Spark data frame, we decided to avoid all of them in sparklyr.

A tree-base approach

One possible way to accomplish SWoR is to have a mutable data structure keeping track of the sample space at each step.

Continuing with the dart-throwing analogy from the previous section, let us say initially, none of the tiles has been taken out yet, and a dart has landed at some point (x in left[0, sumlimits_{k = 1}^{N} w_kright)). Which tile did it hit? This can be answered efficiently if we have a binary tree, pictured as the following (or in general, some (b)-ary tree for integer (b ge 2))

.

To find the tile that was hit given the dart’s position (x), we simply need to traverse down the tree, going through the box containing (x) in each level, incurring a (O(log(N))) cost in time complexity for each sample. To take a tile out of the picture, we update the width of the tile to (0) and propagate this change upwards from leaf level to root of the tree, again incurring a (O(log(N))) cost in time complexity, making the overall time complexity of selecting (n) samples (O(n cdot log(N))), which is not so great for large data sets, and also, not parallelizable across multiple partitions of a Spark data frame.

Rejection sampling

Another possible approach is to use rejection sampling. In term of the previously mentioned dart-throwing analogy, that means not removing any tile that is hit, hence avoiding the performance cost of keeping the sample space up-to-date, but then having to re-throw the dart in each of the subsequent rounds until the dart lands on a tile that was not hit previously. This approach, just like the previous one, would not be performant, and would not be parallelizable across multiple partitions of a Spark data frame either.

A solution that has proven to be much better than any of the naive approaches turns out to be a numerical stable variant of the algorithm described in “Weighted Random Sampling” (Efraimidis and Spirakis 2016) by Pavlos S. Efraimidis and Paul G. Spirakis.

A version of this sampling algorithm implemented by sparklyr does the following to sample (n) out of (N) rows from a Spark data frame (X):

  • For each row (r_j in X), draw a random number (u_j) independently and uniformly randomly from ((0, 1)) and compute the key of (r_j) as (k_j = ln(u_j) / w_j), where (w_j) is the weight of (r_j). Perform this calulation in parallel across all partitions of (X).
  • Select (n) rows with largest keys and return them as the result. This step is also mostly parallelizable: for each partition of (X), one can select up to (n) rows having largest keys within that partition as candidates, and after selecting candidates from all partitions in parallel, simply extract the top (n) rows among all candidates, and return them as the (n) chosen samples.

There are at least 4 reasons why this solution is highly appealing and was chosen to be implemented in sparklyr:

  • It is a one-pass algorithm (i.e., only need to iterate through all rows of a data frame exactly once).
  • Its computational overhead is quite low (as selecting top (n) rows at any stage only requires a bounded priority queue of max size (n), which costs (O(log(n))) per update in time complexity).
  • More importantly, most of its required computations can be performed in parallel. In fact, the only non-parallelizable step is the very last stage of combining top candidates from all partitions and choosing the top (n) rows among those candidates. So, it fits very well into the world of Spark / MapReduce, and has drastically better horizontal scalability compared to the naive approaches.
  • Bonus: It is also suitable for weighted reservoir sampling (i.e., can sample (n) out of a possibly infinite stream of rows according to their weights such that at any moment the (n) samples will be a weighted representation of all rows that have been processed so far).

Why does this algorithm work

As an interesting aside, some readers have probably seen this technique presented in a slightly different form under another name. It is in fact equivalent to a generalized version of the Gumbel-max trick which is commonly referred to as the Gumbel-top-k trick. Readers familiar with properties of the Gumbel distribution will no doubt have an easy time convincing themselves the algorithm above works as expected.

In this section, we will also present a proof of correctness for this algorithm based on elementary properties of probability density function (shortened as PDF from now on), cumulative distribution function (shortened as CDF from now on), and basic calculus.

First of all, to make sense of all the (ln(u_j) / w_j) calculations in this algorithm, one has to understand inverse transform sampling. For each (j in {1, dotsc, N}), consider the probability distribution defined on ((-infty, 0)) with CDF (F_j(x) = e^{w_j cdot x}). In order to pluck out a value (y) from this distribution, we first sample a value (u_j) uniformly randomly out of ((0, 1)) that determines the percentile of (y) (i.e., how our (y) value ranks relative to all possible (y) values, a.k.a, the “overall population,” from this distribution), and then apply (F_j^{-1}) to (u_j) to find (y), so, (y = F_j^{-1}(u_j) = ln(u_j) / w_j).

Secondly, after defining all the required CDF functions (F_j(x) = e^{w_j cdot x}) for (j in {1, dotsc, N}), we can also easily derive their corresponding PDF functions (f_j): [f_j(x) = frac{d F_j(x)}{dx} = w_j e^{w_j cdot x}].

Lastly, with a transparent understanding of the household of likelihood distributions concerned, one can show the likelihood of this algorithm deciding on a given sequence of rows ((r_1, dotsc, r_n)) is the same as (prodlimits_{j = 1}^{n} left( {w_j} center/ {sumlimits_{okay = j}^{N}{w_k}} proper)), an identical to the likelihood beforehand talked about within the “What precisely is SWoR part, which suggests the attainable outcomes of this algorithm will comply with precisely the identical likelihood distribution as that of a (n)-step SWoR.

With a purpose to not deprive our expensive readers the pleasure of finishing this proof by themselves, we now have determined to not inline the remainder of the proof (which boils right down to a calculus train) inside this weblog put up, however it’s obtainable in this file.

Whereas all earlier sections targeted solely on weighted sampling with out substitute, this part will briefly talk about how the exponential-variate strategy may also profit the weighted-sampling-with-replacement use case (which shall be shortened as SWR any more).

Though SWR with pattern dimension (n) might be carried out by (n) unbiased processes every deciding on (1) pattern, parallelizing a SWR workload throughout all partitions of a Spark knowledge body (let’s name it (X)) will nonetheless be extra performant if the variety of partitions is way bigger than (n) and greater than (n) executors can be found in a Spark cluster.

An preliminary resolution we had in thoughts was to run SWR with pattern dimension (n) in parallel on every partition of (X), after which re-sample the outcomes primarily based on relative whole weights of every partition. Regardless of sounding deceptively easy when summarized in phrases, implementing such an answer in follow could be a reasonably difficult activity. First, one has to use the alias methodology or comparable with a view to carry out weighted sampling effectively on every partition of (X), and on high of that, implementing the re-sampling logic throughout all partitions appropriately and verifying the correctness of such process can even require appreciable effort.

As compared, with the assistance of exponential variates, a SWR carried out as (n) unbiased SWoR processes every deciding on (1) pattern is way easier to implement, whereas nonetheless being corresponding to our preliminary resolution by way of effectivity and scalability. An instance implementation of it (which takes fewer than 60 traces of Scala) is offered in samplingutils.scala.

How do we all know sparklyr::sdf_weighted_sample() is working as anticipated? Whereas the rigorous reply to this query is offered in full within the testing part, we thought it might even be helpful to first present some histograms that can assist readers visualize what that check plan is. Subsequently on this part, we are going to do the next:

  • Run dplyr::slice_sample() a number of instances on a small pattern house, with every run utilizing a distinct PRNG seed (pattern dimension shall be lowered to (2) right here in order that there’ll fewer than 100 attainable outcomes and visualization shall be simpler)
  • Do the identical for sdf_weighted_sample()
  • Use histograms to visualise the distribution of sampling outcomes

All through this part, we are going to pattern (2) parts out of ({0, dotsc, 7}) with out substitute in accordance with some weights, so, step one is to arrange the next in R:

library(sparklyr)

sc <- spark_connect(grasp = "native")

# `octs` shall be our pattern house
octs <- knowledge.body(
  x = seq(0, 7),
  weight = c(1, 4, 2, 8, 5, 7, 1, 4)
)
# `octs_sdf` shall be our pattern house copied right into a Spark knowledge body
octs_sdf <- copy_to(sc, octs)

sample_size <- 2

With a purpose to tally up and visualize the sampling outcomes effectively, we will map every attainable consequence to an octal quantity (e.g., (6, 7) will get mapped to (6 cdot 8^0 + 7 cdot 8^1)) utilizing a helper operate to_oct in R:

to_oct <- operate(pattern) sum(8 ^ seq(0, sample_sz - 1) * pattern$x)

We additionally must tally up sampling outcomes from dplyr::slice_sample() and sparklyr::sdf_weighted_sample() in 2 separate arrays:

max_possible_outcome <- to_oct(listing(x = seq(8 - sample_sz, 7)))

sdf_weighted_sample_outcomes <- rep(0, max_possible_outcome)
dplyr_slice_sample_outcomes <- rep(0, max_possible_outcome)

Lastly, we will run each dplyr::slice_sample() and sparklyr::sdf_weighted_sample() for arbitrary variety of iterations and examine tallied outcomes from each:

num_sampling_iters <- 1000  # truly we are going to range this worth from 500 to 5000

for (x in seq(num_sampling_iters)) {
  sample1 <- octs_sdf %>%
    sdf_weighted_sample(
      okay = sample_size, weight_col = "weight", substitute = FALSE, seed = seed
    ) %>%
    acquire() %>%
    to_oct()
  sdf_weighted_sample_outcomes[[sample1]] <-
      sdf_weighted_sample_outcomes[[sample1]] + 1

  seed <- x * 97
  set.seed(seed) # set random seed for dplyr::sample_slice()
  sample2 <- octs %>%
    dplyr::slice_sample(
      n = sample_size, weight_by = weight, exchange = FALSE
    ) %>%
    to_oct()
  dplyr_slice_sample_outcomes[[sample2]] <-
      dplyr_slice_sample_outcomes[[sample2]] + 1
}

After all of the onerous work above, we will now get pleasure from plotting the sampling outcomes from dplyr::slice_sample() and people from sparklyr::sdf_weighted_sample() after 500, 1000, and 5000 iterations and observe how the distributions of each begin converging after numerous iterations.

Sampling outcomes after 500, 1000, and 5000 iterations, proven in 3 histograms:


(you’ll most likely must view it in a separate tab to see every thing clearly)

Whereas parallelized sampling primarily based on exponential variates seems to be unbelievable on paper, there are nonetheless loads of potential pitfalls in terms of translating such thought into code, and as ordinary, an excellent testing plan is important to make sure implementation correctness.

As an illustration, numerical instability points from floating level numbers come up if (ln(u_j) / w_j) have been changed by (u_j ^ {1 / w_j}) within the aforementioned computations.

One other extra refined supply of error is the utilization of PRNG seeds. For instance, think about the next:

  def sampleWithoutReplacement(
    rdd: RDD[Row],
    weightColumn: String,
    sampleSize: Int,
    seed: Lengthy
  ): RDD[Row] = {
    val sc = rdd.context
    if (0 == sampleSize) {
      sc.emptyRDD
    } else {
      val random = new Random(seed)
      val mapRDDs = rdd.mapPartitions { iter =>
        for (row <- iter) {
          val weight = row.getAs[Double](weightColumn)
          val key = scala.math.log(random.nextDouble) / weight
          
        }
        ...
      }
      ...
    }
  }

Despite the fact that it would look OK upon first look, rdd.mapPartitions(...) from the above will trigger the identical sequence of pseudorandom numbers to be utilized to a number of partitions of the enter Spark knowledge body, which can trigger undesired bias (i.e., sampling outcomes from one partition could have non-trivial correlation with these from one other partition when such correlation ought to be negligible in an accurate implementation).

The code snippet beneath is an instance implementation by which every partition of the enter Spark knowledge body is sampled utilizing a distinct sequence of pseudorandom numbers:

  def sampleWithoutReplacement(
    rdd: RDD[Row],
    weightColumn: String,
    sampleSize: Int,
    seed: Lengthy
  ): RDD[Row] = {
    val sc = rdd.context
    if (0 == sampleSize) {
      sc.emptyRDD
    } else {
      val mapRDDs = rdd.mapPartitionsWithIndex { (index, iter) =>
        val random = new Random(seed + index)

        for (row <- iter) {
          val weight = row.getAs[Double](weightColumn)
          val key = scala.math.log(random.nextDouble) / weight
          
        }

        ...
      }
    ...
  }
}

An instance check case by which a two-sided Kolmogorov-Smirnov check is used to check distribution of sampling outcomes from dplyr::slice_sample() with that from sparklyr::sdf_weighted_sample() is proven in this file. Such assessments have confirmed to be efficient in surfacing non-obvious implementation errors reminiscent of those talked about above.

Please be aware the sparklyr::sdf_weighted_sample() performance isn’t included in any official launch of sparklyr but. We’re aiming to ship it as a part of sparklyr 1.4 in about 2 to three months from now.

In the intervening time, you possibly can strive it out with the next steps:

First, be sure remotes is put in, after which run

to put in sparklyr from supply.

Subsequent, create a check knowledge body with numeric weight column consisting of non-negative weight for every row, after which copy it to Spark (see code snippet beneath for example):

library(sparklyr)

sc <- spark_connect(grasp = "native")

example_df <- knowledge.body(
  x = seq(100),
  weight = c(
    rep(1, 50),
    rep(2, 25),
    rep(4, 10),
    rep(8, 10),
    rep(16, 5)
  )
)
example_sdf <- copy_to(sc, example_df, repartition = 5, overwrite = TRUE)

Lastly, run sparklyr::sdf_weighted_sample() on example_sdf:

sample_size <- 5

samples_without_replacement <- example_sdf %>%
  sdf_weighted_sample(
    weight_col = "weight",
    okay = sample_size,
    substitute = FALSE
  )

samples_without_replacement %>% print(n = sample_size)
## # Supply: spark> [?? x 2]
##       x weight
##     
## 1    48      1
## 2    22      1
## 3    78      4
## 4    56      2
## 5   100     16
samples_with_replacement <- example_sdf %>%
  sdf_weighted_sample(
    weight_col = "weight",
    okay = sample_size,
    substitute = TRUE
  )

samples_with_replacement %>% print(n = sample_size)
## # Supply: spark> [?? x 2]
##       x weight
##     
## 1    86      8
## 2    97     16
## 3    91      8
## 4   100     16
## 5    65      2

At first, the creator needs to thank @ajing for reporting the weighted sampling use instances weren’t correctly supported but in sparklyr 1.3 and suggesting it ought to be a part of some future model of sparklyr on this Github difficulty.

Particular thanks additionally goes to Javier (@javierluraschi) for reviewing the implementation of all exponential-variate primarily based sampling algorithms in sparklyr, and to Mara (@batpigandme), Sigrid (@Sigrid), and Javier (@javierluraschi) for his or her worthwhile editorial ideas.

We hope you’ve got loved studying this weblog put up! When you want to be taught extra about sparklyr, we suggest visiting sparklyr.ai, spark.rstudio.com, and a number of the earlier launch posts reminiscent of sparklyr 1.3 and sparklyr 1.2. Additionally, your contributions to sparklyr are greater than welcome. Please ship your pull requests by means of right here and file any bug report or characteristic request in right here.

Thanks for studying!

Efraimidis, Pavlos, and Paul (Pavlos) Spirakis. 2016. “Weighted Random Sampling.” In Encyclopedia of Algorithms, edited by Ming-Yang Kao, 2365–67. New York, NY: Springer New York. https://doi.org/10.1007/978-1-4939-2864-4_478.

Don Lemon, Georgia Fort arrested over Minnesota church protest

0


This story appeared in The Logoff, a each day e-newsletter that helps you keep knowledgeable concerning the Trump administration with out letting political information take over your life. Subscribe right here.

Welcome to The Logoff: The Trump administration is indicting two journalists for his or her protection of a Minneapolis protest.

What occurred? Don Lemon, a longtime CNN host who was fired from the community in 2023, and Georgia Fort, an unbiased journalist in St. Paul, Minnesota, had been charged with conspiracy to deprive rights and interfering with non secular freedom on Friday, together with a number of protesters.

The fees stem from their protection of a Minneapolis-area protest earlier this month; activists interrupted a church service in St. Paul over a pastor on the church who works for ICE, whereas Lemon and Fort documented the protest.

The indictment, unsealed Friday, which prices 9 whole individuals on the identical two counts, alleges Lemon, Fort, and the opposite defendants “entered the Church in a coordinated takeover-style assault and engaged in acts of oppression, intimidation, threats, interference, and bodily obstruction.”

What’s the context? The Trump administration tried and did not cost Lemon at the very least twice previous to Friday’s indictment by a grand jury. Shortly after the protest, a federal Justice of the Peace choose refused to signal an arrest warrant for Lemon; when the Trump administration appealed that call, it was additionally rejected by a federal district courtroom choose and by a federal appeals courtroom panel.

Three activists had been additionally charged final week in reference to the protests; in a single occasion, the White Home digitally manipulated a photograph of Nekima Levy Armstrong to make it seem she was crying when she was arrested.

Why does this matter? The indictment of journalists is disturbing on its face, as are the lengths the Trump administration went to to safe the indictment. Equally alarming is the administration’s obvious giddiness: The White Home touted Lemon’s arrest in an X publish Friday morning, writing “When life provides you lemons… ⛓️” over a black-and-white photograph of Lemon. It additionally referred to the “St. Paul Church Riots” — plural, although it was a single protest.

And with that, it’s time to log out…

This story made me smile on the finish of a protracted week: Folks raised greater than $8,000 for a faculty crossing guard in Chicago, Joe Sass, after he went viral for serving to a scholar throughout a road flooded with icy slush.

“I like being a helper,” Sass informed the Washington Publish (it’s a present hyperlink). “And I feel if individuals may consider me as that, then I feel that’s like some of the lovely issues on the earth. I’m only a good friend out right here serving to my neighbors.”

Have a restful weekend, and we’ll see you again right here on Monday!

NASA’s Artemis 2 mission to the moon places Crew-12 SpaceX launch in delicate dance

0


It’s the better of instances, and it’s (removed from, really,) the worst of time for NASA, with two huge astronaut launches converging towards the identical week, as a uncommon Arctic chilly entrance pushes mission schedules right into a logistical whirlwind.

It is a story of NASA’s highest-profile mission in additional than half a century — the Artemis 2 astronaut flight round the moon — brushing up in opposition to the launch of SpaceX’s Crew-12 mission to the Worldwide House Station (ISS). That liftoff has been accelerated up teh calendar to exchange the Crew-11 astronauts, who have been compelled again to Earth early because of an undisclosed medical subject with one of many astronauts.

Downside in multilevel (hierarchical) multinomial logistic regression

0


The predicted variable is a categorical response, named
resp with ranges ‘1’, ‘2’, ‘3’, and ‘4’ (nominal labels,
not numerical values). The predictor is a categorical/nominal
variable
named group, with ranges ‘A’ by means of ‘Okay’.

 

Discover these points of the info:

  • The proportion of response ‘2’ equals response ‘3’
    inside every group, and throughout all teams.
    Particularly, in each group, p(‘2’) = p(‘3’) = 0.25.

  • The proportions of responses ‘1’ and ‘4’ are symmetric
    reflections of one another, with p(‘1’|A) = p(‘4’|Okay), p(‘1’|B) =
    p(‘4’|J), and so forth.

 

Due to the a number of teams, this can be a pure setting to attempt
a hierarchical mannequin that shares data throughout teams, and can
present shrinkage throughout teams. Due to the symmetry within the knowledge,
the hierarchical mannequin ought to symmetrically shrink the response ‘1’
proportions nearer to 0.25, and do the identical for the response ‘4’
proportions.

I name brm() with the same old hierarchical
method = resp | trials(n) ~ 1 + (1 | group)

We will get the posterior predictions and make a plot:

Discover these points of the posterior
predictions:

  • Opposite to the info, the proportion of response ‘2’ is
    not the identical throughout teams, and the proportion of response ‘3’ is just not
    the identical throughout teams.

  • Opposite to the info, inside each group, p(‘1’) =
    p(‘2’) = p(‘3’)
    .

  • Opposite to the info, the proportion of response ‘1’
    doesn’t symmetrically mirror response ‘4’.

 

Within the full doc linked under, I clarify why this occurs and I suggest an answer. Has this drawback been identified earlier than? Has this resolution been proposed earlier than?

To see the complete description of the problem, click on the hyperlink under to obtain the HTML file. Then discover the downloaded HTML file in your laptop and double-click it to open it in your browser:

https://drive.google.com/uc?export=obtain&id=1z_hGTzkkIlMJ0Tk2ONCH96bZh10l0gMr

(In case you’re reluctant to click on a direct obtain hyperlink, you’ll find the HTML doc on the following hyperlink, after which manually obtain it https://drive.google.com/file/d/1z_hGTzkkIlMJ0Tk2ONCH96bZh10l0gMr/view?usp=drive_link.) 

 

I’ve posted this on the Stan Boards, so if you’re on this subject it’s possible you’ll need to monitor the feedback there:

https://discourse.mc-stan.org/t/problem-in-multilevel-hierarchical-multinomial-logistic-regression-with-brms/40882