Wiktionary:Beer parlour/2022/September

The Spanish Inquisition vs. CFI

When discussing whether use of a name is about the person/entity or figurative, I like to think of the example of Monty Python's Spanish Inquisition sketches. These start out with character innocently saying "I wasn't expecting some kind of Spanish Inquisition", at which point characters in historical costumes burst through the door and say, dramatically "Nobody expects the Spanish Inquisition!" Then they launch into a self-descriptive monolog starting with "Our chief weapon is surprise".

The whole logic of the sketches hinges on whether "Spanish Inquisition" refers to a heavy-handed interrogation (figurative) or a historical/fictional entity (non-figurative). So, when you see an entry for a named entity, ask yourself: is this making the same mistake that the Spanish Inquisition makes in the Monty Python sketches? Chuck Entz (talk) 15:24, 1 September 2022 (UTC)[reply]

The interesting question here is whether the figurative uses are figurative uses of the literal sense or rather literal uses of the figurative sense. And that is subject to some academic debate, from what I remember. As for Spanish Inquisition, what is interesting is that the figurative uses use the definite article the, as in "I agreed to answer a few questions, but I didn't expect the Spanish Inquisition." And a further question is to what extent the literal sense is kind of embedded in the figurative uses, expecting the reader/listener to know the literal sense. I am inclined to think that figurative uses of literal senses for persons are figurative uses of the literal sense and that we would be best served by having a single definition of the form "Literal sense definition, noted for characteristics X, Y, Z". And if we assume that a separate figurative sense is warranted, then the question is whether the literal sense should be relegated only to etymology despite often being the main sense of the defined term. That there are figurative uses of names of persons and groups, of that there is no question. --Dan Polansky (talk) 16:01, 1 September 2022 (UTC)[reply]

CEFR levels

I would like to know if it is possible to add CEFR levels to entries, as does the Cambridge dictionary. Backinstadiums (talk) 19:49, 1 September 2022 (UTC)[reply]

Wouldn't that have to be done by individual definition/sense?

Is there a source for such information or a set of criteria for determining a level for a definition? DCDuring (talk) 19:54, 1 September 2022 (UTC)[reply]

The Cambridge Dictionary lexicographic resources include them. Ther is this one too https://www.englishprofile.org/wordlists/evp Backinstadiums (talk) 20:26, 1 September 2022 (UTC)[reply]

We'd probably have to revise at least one definition per entry to make it conform to the CEFR level. Eg, their first definition of iron ("a dark grey metal used to make steel and found in very small amounts in blood and food") corresponds to our def. 1. "A common, inexpensive metal, silvery grey when untarnished, that rusts, is attracted by magnets, and is used in making steel." Their second definition "a piece of electrical equipment that you use for making clothes flat and smooth" corresponds to our def. 4: ("A tool or appliance made of metal, which is heated and then used to transfer heat to something else; most often a thick piece of metal fitted with a handle and having a flat, roughly triangular bottom, which is heated and used to press wrinkles from clothing, and now usually containing an electrical heating apparatus.")

The problem with our def. 4, 2 verbose NPs, each with a complex compound clause as modifer, for CEFR purposes should be evident. In 2012 our def. 1 was "A metallic chemical element having atomic number 26, and symbol Fe.", brief but using terms not covered in elementary education and forgotten by many adults and failing to connect with everyday experience outside of classrooms. Even our current def. 1 fails as a definition in CEFR terms because of some of the terms it uses (untarnished, magnet) are not themselves listed in CEFR. (We don't even use a defining vocabulary to attempt to simplify any of our definitions.) DCDuring (talk) 23:10, 1 September 2022 (UTC)[reply]

Imagine the fun of keeping these levels updated with every edit by an IP (and not just edits to the base article, but to the ones it derives its own level from). The phrase moon on a stick comes to mind. Equinox ◑ 23:14, 1 September 2022 (UTC)[reply]

Let's do it tho' Backinstadiums (talk) 10:28, 7 September 2022 (UTC)[reply]

Sounds like something that should be added to Wikidata instead. – Jberkel 08:09, 8 September 2022 (UTC)[reply]

Who will though? Backinstadiums (talk) 09:23, 9 September 2022 (UTC)[reply]

Category:English terms spelled with 0 etc.: yes or no?

We currently have 397 terms in Category:English terms spelled with 0, all of which have the category added manually. Module:headword has support for adding categories like this automatically, depending on the value of the standardChars field in Module:languages/data2; if a character isn't listed, a terms spelled with category is added. However, digits 0-9 are included in this field for English (same for many other languages, but not all), so the category isn't added automatically. IMO either it should be added automatically or not at all. Which one is correct? Benwing2 (talk) 04:00, 2 September 2022 (UTC)[reply]

Er, I would say that people should never be adding "terms spelled with X" manually because a bot can do it, and a person can make typos. (I also cry to think about human users wasting their time doing shit that a machine can do.) Equinox ◑ 04:04, 2 September 2022 (UTC)[reply]

@Equinox I completely agree; in any case we should remove the manually-added categories. What I'm asking (sorry for not being clear) is whether we should remove digits 0-9 from English standardChars, so that categories like this get autopopulated, or keep them, so that the categories get emptied. Benwing2 (talk) 04:13, 2 September 2022 (UTC)[reply]

We need a user survey to find out whether anybody has ever used "English terms spelled with X". I never saw the point of them at all. I know people always complain about anagrams, but I bet loads of people use that, to check Scrabble words and stuff. Who actually goes on the Internet to say "oh, today I want a list of words that contain the digit 7?" Nobody. That's who. John Q Nonexistent does that. Equinox ◑ 04:21, 2 September 2022 (UTC)[reply]

I find them interesting if they're weird. Digits are not weird. Theknightwho (talk) 10:10, 2 September 2022 (UTC)[reply]

I used to use these when doing things like checking terms spelled with æ or œ that were not categorized as archaic / obsolete, to find ones that needed to be, and I had also wanted to look over "terms spelled with '" for some reason I no longer remember, and was frustrated that category didn't exist because people thought it was too common / boring. But that was me doing cleanup work, I haven't used them in a while, and I know there are other (albeit time-consuming) ways of generating lists of "all German words spelled with x" with AWB if I need to; I don't know if a reader would find a category for "x" or "0" useful. Probably they're only interested in seeing what's spelled with weird characters, as TKW says.
Is it expensive to add these, Lua-wise? Once the module is already checking for ligatures, is it any more expensive to have it also check for 0? If it's cheap, maybe just add 0-9 and just make them hidden categories if we're concerned they'll clutter up the bottom of the page, but if it's expensive, it's probably not worth the bother. - -sche (discuss) 15:56, 2 September 2022 (UTC)[reply]

-sche's comment makes me realise that actually this whole picture isn't a category issue (really), it's a search issue. There might not be a way right now to say "find me pages whose titles include the é like in café", but there might as well be, because that is a search problem. It's not something we should waste our time or categories on. Equinox ◑ 16:02, 2 September 2022 (UTC)[reply]

MediaWiki has reasonably good set of search functions all things told, but it's not the most intuitive. Theknightwho (talk) 16:33, 2 September 2022 (UTC)[reply]

intitle:0 insource:"\=English" works. You can't use categories like Category:English lemmas and Category:English non-lemma forms in searches because they're just too big- apparently the search engine loads the entire category before applying the filters. Chuck Entz (talk) 17:05, 2 September 2022 (UTC)[reply]

Never mind. It seems to only find "0" in isolation. The intitle: would require a regex, but these tend to time out if not carefully designed. Chuck Entz (talk) 17:19, 2 September 2022 (UTC)[reply]

I mean at this point I'm gonna be seen as a troll: but can we find one single human user who wants to find "English terms with a zero in them"? No. Just drop this bollocks. It's obviously autistic nonsense, of no value to anybody. Equinox ◑

I went ahead and deleted all manually specified categories for terms spelled with 0 through 9 and am in the process of doing the same with various ASCII puncutation characters, all of which are in standardChars (especially useless categories like Category:English terms spelled with - and others). Benwing2 (talk) 05:47, 3 September 2022 (UTC)[reply]

@Benwing2: Category:Translingual terms spelled with less-than sign is useful, though. I do not agree with this removal. J3133 (talk) 06:17, 3 September 2022 (UTC)[reply]

@J3133 Fine, I can put it back. However, < and > are in standardChars; if you think it's useful to have those categories, you should lobby for removal of them from standardChars. I'm not about to manually add that category to all pages with a < or > sign in them. Benwing2 (talk) 06:20, 3 September 2022 (UTC)[reply]

I did add the category manually to all pages earlier, as Translingual does not have standardChars. J3133 (talk) 06:24, 3 September 2022 (UTC)[reply]

@Benwing2 I find this as another thing that was implemented way too quickly with little discussion. I have found the categories interesting, it's not "autistic nonsense", and Wiktionary as a whole doesn't even really know what most readers do to begin with, nor do they really use Beer Parlour to comment on issues like these. Was barely given more than 24 hours to respond before the change went in. I'd really appreciate it, as I've mentioned before, that changes like these be given more time and discussion before they go in, especially if they affect a bunch of entries. It's weird that things can move extremely fast on one end and then painstakingly slow on another. AG202 (talk) 17:32, 3 September 2022 (UTC)[reply]

@AG202 I'm sorry, I just removed the manual categories. I agree maybe I acted too quickly, but I still maintain it's not useful to have manually added categories like this, and they are necessarily incomplete; most of these categories were added long ago, and most terms added in the past few years containing digits were never in the categories. If you think we should have 'terms spelled with 0-9' categories, the correct way is to use the standardChars mechanism and remove 0-9 from the list. I haven't touched any existing standardChars, and I agree it should require consensus over a week or so to do so. Benwing2 (talk) 17:38, 3 September 2022 (UTC)[reply]

Mongolian terms spelled with ъ and щ

Discussion moved to WT:GP.

Closing RFD discussions using the strength of the arguments

Some hold that RFD discussions should be closed based on the strength of the arguments presented rather than on the "keep" and "delete" post counts. This isn't workable, as per the following.

Imagine a RFD nomination with the rationale "sum of parts". Three additional editors post "Delete as SOP". A keeper posts a keep with an elaborate explanation why the term is not a sum of parts. A month passes. The keeper closes the discussion with "RFD kept: the arguments for keeping are stronger". The keeper is honestly using their judgment to assess the strength of the arguments; it is the same judgment they used to post the keep in the first place. The number of discussion participants is 5 (1 + 3 + 1), but the number of arguments is 2 since the 3 additional pro-deletion editors did not post any additional arguments. In fact, since only the strength of the arguments should matter and not vote counts, the deleters should not have posted anything since they did not add anything to arguments or their strength. This is an absurd way to administer a RFD process; it turns the discussion participants into having a mere advisory role, removes all decision making authority from them and places all the authority on the single closer.

The above also shows that our RFD process has so far been based predominantly on vote counting: 1) participants usually post boldface keeps and deletes; 2) participants add their posts even when that adds nothing to the arguments already presented, serving to increase the number of votes and not number of arguments; 3) the RFD closers sometimes present explicit vote counts as part of closure statements.

It is one thing to allow an occasional vote-count override to handle abuses of the process, it is another thing to vest the closer with argument-assessment powers.

What is Wikipedia doing? Wikipedia's notion of the consensus process is Orwellian and confusing. They pretend to be closing discussions based on the strength of argument, per W:Wikipedia:Consensus: "Consensus is ascertained by the quality of the arguments given on the various sides of an issue, as viewed through the lens of Wikipedia policy." This is unworkable as I have shown above. What happens in practice is some kind of indeterminate mix of vote counting and strength of the arguments. It usually does not happen that a lone dissenter is allowed to close a discussion based on their assessment of the strength of the argument and override the near-unanimity, but such an override is a direct logical consequence of the quoted policy. Their process is Orwellian in so far as they redefine the common word "consensus" to mean something which it does not mean outside of Wikipedia. The actual consensus-based processes in business and other organizations involve discussion where the discussion participants are trying to discuss and exchange arguments until they reach a general agreement (not unanimity), but what is reached is a state in which a supermajority agrees with the outcome. It is good to emphasize the need of arguments rather than bare thoughtless voting but at the end of the day, vote counting is what defines whether there is consensus. Our processes cannot approach the business consensus processes since there is no real-time interaction.

What can be done to improve use of good arguments? The following policy comes to mind:

A RFD post that contains no rationale or only an obviously nonsensical rationale should be stricken out and discounted. "Delete per nom" and "Delete as SOP" count as having rationales, and so does "Keep per Joe Hoe".

This would improve things just a little since many poor rationales are not obviously nonsensical but still poor, but it would allow to discount bare keeps and bare deletes. A similar policy could be adopted for formal votes, for which from what I remember there would be quite some opposition; too many people seem to like bare votes too much. Dan Polansky (talk) 07:13, 3 September 2022 (UTC)[reply]

Stop trying to start the same discussion again because you didn’t get your own way last time. Theknightwho (talk) 11:39, 3 September 2022 (UTC)[reply]

Last time, the discussion was initiated with a question of numerical threshold. Opposers required flexibility and no firm threshold but most did not say the closer should only consider the strength of the arguments being made. Here I am arguing against a specific principle in a way that was not covered in the previous discussion. I wonder what kind of objections can be raised against the points that I made. I don't think anyone can seriously maintain that the strength of the argument should be the sole factor per the above. --Dan Polansky (talk) 12:59, 3 September 2022 (UTC)[reply]

You gave an example of someone with a conflict of interest closing a discussion, as an argument against the principle of closing based on the strength of argument at all. You then made a leap of logic to say that that means we must, therefore, be generally closing based on numbers. This does not follow, because closers can take into account multiple aspects of the discussion.

The point that “strength of argument should be the sole factor” does not entail that we must therefore have a numerical threshold, which would necessarily entail that strength of argument cannot influence the closer at all.

As a side point - please stop endlessly suggesting policy changes that have zero chance of being implemented. There needs to be genuine momentum before a vote happens - and you don’t have it at the moment. Theknightwho (talk) 13:31, 3 September 2022 (UTC)[reply]

The modification to remove the alleged conflict of interest is trivial: an editor who supports keeping does not post keep and then closes discussions as they see fit in favor of keeping, exercising undue authority. And the conflict of interest should not really be the problem anyway since having posted keep does not disqualify the editor as a competent judge of the strength of the arguments. It is not anything like conflict of interest in a legal sense.

No one has proposed how to combine the strength of the argument with vote counts; if we had such a proposal, we could send it to a vote, but there is none. I have no idea how to do it. I am all ears. Ideally I would like to see an example closure of some real-world example where both vote counts and argument strength were taken into account. --Dan Polansky (talk) 13:52, 3 September 2022 (UTC)[reply]

How about you WT:AGF in the closer? Theknightwho (talk) 14:14, 3 September 2022 (UTC)[reply]

I am assuming good faith in the closer. The closer uses their best judgment to assess the strength of the arguments, with the intention to make dictionary better. Under the strength of argument principle, the closer is under no obligation to consider what others think best. The problem is it gives the closer the sole authority. And again, no one has proposed how to combine the strength of the argument with vote counts, and if someone has a formulation and application example, I am all ears. --Dan Polansky (talk) 14:19, 3 September 2022 (UTC)[reply]

But nobody said we should only use the strength of argument criterion. Theknightwho (talk) 14:20, 3 September 2022 (UTC)[reply]

The quoted Wikipedia passage suggest so. And if we should use a combination, how? What is an example of a RFD discussion evaluation using a combination of the two principles? Can it be seen somewhere? --Dan Polansky (talk) 14:27, 3 September 2022 (UTC)[reply]

This isn’t Wikipedia. An example is the closer considering the numbers and the strength of the arguments made, and explaining if the result isn’t immediately intuitive. Theknightwho (talk) 14:41, 3 September 2022 (UTC)[reply]

Where can I find an actual example? Which RFD closure is an example? Is someone else willing to draft a description of the combination of the two principles or work out an example? --Dan Polansky (talk) 15:14, 3 September 2022 (UTC)[reply]

Frankly, having now closed a batch of RFD nominations, the idea that I as a closer should read through the discussion and properly think about the strength of the arguments made and have a look at possible evidence, possibly also checking other dictionaries, seems remarkably impractical. Counting votes is sometimes tedious enough. We should be glad that closers want to do the intelectually trivial closure work in RFD instead of requiring them to assess the arguments. --Dan Polansky (talk) 18:38, 3 September 2022 (UTC)[reply]

The following combination of vote tallying and strength of argument does not work either: Let the method be that the closer considers the arguments made and then disregards those votes that make or refer to arguments that are weak, only tallying those that are strong. This is not solely a strength of argument consideration since the number of votes still makes a difference for the votes that make a strong argument. The problematic scenario is the same as above: 4 voters post "Delete as SOP", 1 voter posts "Keep" with an elaborate non-SOP argument, and the closer closes the discussion as "RFD-kept: the pro-deletion votes were discounted as having a weak argument". It seems that "weak argument" is too subjective a filter, maybe not really subjective but practically subjective by differing too much between editors. The differences in votes cast are themselves evidence of the differences in the strength of argument assessments between editors. A filter that is more realistic is to remove votes with "no argument" or "obviously nonsensical argument", although one can argue that bare "delete" votes are equivalent to "delete per nom" or "delete as SOP" (SOP is the most common rationale in RFD) so even bare deletes are probably not worth discounting. Bare "keep" votes can be understood as "keep as non-SOP"; the question is whether the keepers have more of a burden of proof than the deleters. No other method of combining strength of argument with vote tallying comes to mind. If someone has an idea, I am very eager to hear it. --Dan Polansky (talk) 10:36, 5 September 2022 (UTC)[reply]

Making adding sources default

Pretty bold proposal but here goes nothing: What about making it obligatory to add one reference, quote, mention or similar to any entry when creating it?

I see a lot of upsides to this: We will have less clutter on RFV (which would only be needed for rfv-sense, validity verification and WDLs), the reader would always know where we got the entries from, and it's just generally a good idea to double-check any entry you create with the sources we have.

If, for some reason, there are languages out there with so little documentation at all, that we could potentially get a native speaker on the wiki who cannot find any references for words that are in widespread use (so, this word could pass CFI without any references), then we could make a list of such languages and exclude them from this policy, but from my experience such languages are either extremely rare or nonexistent.

Obviously, this proposal doesn't cover reconstructions, which are often justifiably OR, but basic attested languages need to be able to be attested anyway, so why not do it when creating the entry? I feel like I'm overlooking some huge flaw in this (because why wouldn't we have that policy already?), but I can't find it. I'm eager to know your opinions on this. Thadh (talk) 14:36, 3 September 2022 (UTC)[reply]

This is something I already practice. If I am adding a word not from a source, I immediately quote it so as to avoid the HEADACHE that is opening Non-English RFV, and even when I am adding a word from a source I check if it shows up in corpora anyway. I think one of the biggest ways to increase the reliability of Wiktionary is to show our work - either from a reference or with quotes. It's more work but there are SOME tools to help with that, like QuietQuinton and reference templates (yes you have to make it, but it's usually not that hard). Vininn126 (talk) 14:45, 3 September 2022 (UTC)[reply]

I’m okay with this so long as it includes references. I’d be less okay with it if cites were necessary, due to the LDL issue. Theknightwho (talk) 14:53, 3 September 2022 (UTC)[reply]

This isn't an "and" argument, it's an "or" argument. That is the given language should have ONE of the listed items. Vininn126 (talk) 14:57, 3 September 2022 (UTC)[reply]

I’m aware - I was just making my position clear. Theknightwho (talk) 15:03, 3 September 2022 (UTC)[reply]

Obviously, a great deal of the words I add I would probably not be able to quote, but I definitely add a reference in that case. Thadh (talk) 14:57, 3 September 2022 (UTC)[reply]

In short this thread comes down to do you believe in quantity vs quality. Vininn126 (talk) 22:59, 3 September 2022 (UTC)[reply]

I like the proposal but I also see the requirement as too much of an additional burden, so I don't really know at this point. I try to indicate sources in the edit summary but do not bother to format the quotations since it is such a hassle. If new entries (after some cutoff date) failing to meet the requirement were speedy deleted, we might lose some good contributions. And the reader is not obliged to believe an entry that has no substantiation. --Dan Polansky (talk) 15:18, 3 September 2022 (UTC)[reply]

First of all, you really should learn to format sources and start doing it - it's fairly easy with a reference template, and nobody looks at the page's history for references.

Second, I wasn't talking about deleting anything just yet: If we agree to only add referenced entries from now on, that would already amount to quite a lot of good, and we could talk cleanup later.

The third point, however, quite frankly baffles me: The reader doesn't have to trust any of our entries, but we certainly want them to, don't we? Thadh (talk) 15:25, 3 September 2022 (UTC)[reply]

I know how but it is laborious. And I have my priorities. We should not trust anything unsubstantiated either. If we require substantiation for all entries, we can have no entries to serve as hypotheses yet to be verified. Sometimes you are better off finding an unsubstantiated hypothesis than finding nothing. And the practicalities are not obviously surmountable: the amount of work that would need to be done to substantiate, say, 20% of our entries would be enormous. We could start by running a bot to add OneLook to all entries for which OneLook has some of the classical dictionaries (not all of OneLook does that); that would alone increase the volume of substantiation hugely, without a need of a policy. But someone has to run the bot and design it: the bot has to parse the OneLook page to see which dictionaries are there. This could be further done for {{R:GNV}}. For other languages, bots could be adding references for the reference templates that we have collected; that again would hugely increase the level of substantiation without any policy change, but again requires a bot designer and operator. This would be a start. To do all this manually is just an unrealistic effort. And it does not require any policy change. Once that would be done, we could determine how many entries remain without substantiation and that would tell us how much more manual effort is required and whether a policy like the one proposed is worth it. --Dan Polansky (talk) 15:35, 3 September 2022 (UTC)[reply]

I'm inclined to disagree. Sometimes I have this phase where I churn out a ton of German compound entries in a short amount of time for words that obviously exist and that nobody would seriously doubt the existence of (because proof is one Google search away; they're however not always found in the Duden und Co.). I can't help but see this proposal as incurring a lot of unnecessary extra work which would only lead to me being able to create fewer entries in the same amount of time (which clearly outweighs the positives IMO). To give an example, I can create entries such as ganzstündig in less than a minute, but finding a reference or a quotation can itself already take a minute. I assume that Surjection who I regularly see create similar articles for Finnish compounds would feel the same way.

Further, when (or rather, IF) I finally get around to documenting Alemannic, it would be annoying having to find sources for the most basic of words (seeing that we don't even have Alemannic German chaufe or hebe). The only really good reference (Schweizerisches Idiotikon) is rather unwieldy to even just read. This could however be avoided if this proposal doesn't apply to terms that pass WT:CFI by the "clearly widespread use" clause. — Fytcha〈 T | L | C 〉 15:42, 3 September 2022 (UTC)[reply]

For German, GNV works as well, so GNV could work as a minimal substantiation of existence of a form, and could be added by a bot. But GNV only covers a couple of languages, so the substance of the above stays valid. --Dan Polansky (talk) 15:43, 3 September 2022 (UTC)[reply]

@Fytcha: See the third point of my original post regarding Alemannic. On ganzstündig, you seem to have added a quote, so I don't really see an issue; Do we really want to prioritise speed over quality? I personally think a user would rather see one translation he can trust than three he cannot. Moreover, I'm not saying you need to add some kind of perfect quote to illustrate everything, just one simple (even untranslated) quote that fits the CFI would be fine by me. Thadh (talk) 15:54, 3 September 2022 (UTC)[reply]

I think it’s important to separate the issue from CFI, too. While one route for CFI is that a term must be in clear, widespread use, that could still apply to terms for which a source has been added but not a citation. It’s relatively trivial to add dictionary source templates - at most you might need to specify a page number, or some code for the URL to work. It’s pretty unlikely that a common German term is not going to be in Duden. Theknightwho (talk) 16:04, 3 September 2022 (UTC)[reply]

On ganzstündig, you seem to have added a quote, so I don't really see an issue; Right, that was not the ideal example to provide (Islamologe is a better one, no Duden nor DWDS entry for this one either) but I think the point I was making was still clear: ganzstündig (minus the quote) took <1min to create, but then adding a quote can take that same amount of time itself (or even more). So in essence, the proposal drastically reduces my number of entries per time without really decreasing my number of mistakes per time (it however lends more credence to the contents of the entries, I'll give you that).

I personally think a user would rather see one translation he can trust than three he cannot. It's likely this point where we diverge in opinion. I'd personally take "Fytcha's 30k word dictionary without quotes" over "Fytcha's 10k word dictionary with a first page quotation from Google Books" but this opinion is entirely informed by my personal use case for dictionaries (i.e. using them to understand text (so the text's context serves as a sanity check already) and using multiple dictionaries in conjunction). I can totally understand though why somebody would prefer the latter. — Fytcha〈 T | L | C 〉 16:18, 3 September 2022 (UTC)[reply]

Would you also accept something I've been doing with first attestations on Polish terms a la akcesoryjny? It's as good as a quote. (Ignoring the fact it has other citations - I mean imagine an entry with ONLY that.) Vininn126 (talk) 16:21, 3 September 2022 (UTC)[reply]

We don’t prioritize speed over quality. The appendix of a page just does not as a rule partake in its quality, especially if you make it a rule, but it is only a fig leaf. It may even distracting from creating quality, hence one bothers even not with references, that might even add too little or cause more confusion. And as Fytcha hinted, in the area of compounds there are more common German terms than the other dictionaries are willing to include. One could just parse a legal commentary and get a list of thousands of such words that are familiar to jurists at least in context but not added by Duden and competitors, for being too terse. Even linguistics has left a lot of stuff that has passed by the internet references, I noticed when it was 2022 and I had to create words like potamonym, ichthyonym, dendronym, which you have no difficulty to search. So this is the matter with every science, and the bar should not be higher to casually add them, for example while sitting in the library and actually doing something else, when you shouldn’t be browsing Wiktionary. Fay Freak (talk) 17:02, 3 September 2022 (UTC)[reply]

I don't understand your issue - if you found this term in a book you can just quote the book, can't you? You'd have to verify the term if it were sent to RFV anyway, wouldn't you? Thadh (talk) 19:25, 3 September 2022 (UTC)[reply]

One of the issues is that quoting is a faff - there's a reason why we automate references when possible. I also use plenty of jargon at work that is difficult to find citations for - particularly when it's a niche use of an otherwise common word. Theknightwho (talk) 20:56, 3 September 2022 (UTC)[reply]

But how often does that actually happen? Is there really NO way of finding at least one citation/quotation? Vininn126 (talk) 21:01, 3 September 2022 (UTC)[reply]

I didn't say it's not possible - I said it's a faff. Theknightwho (talk) 17:37, 4 September 2022 (UTC)[reply]

I agree with Fytcha that this might be taking up too much time when creating new entries. There is more fuss to do for Chinese, beyond copy-pasting and formatting quotations: the words have to be manually spaced and formatted for {{zh-x}}, and the auto-romanization has to be checked if it is correct or not (which often is not the case so I had to go through more stuff), this part alone takes at least double, perhaps triple, the time used for writing the entry itself. For some entries it might be more tedious, such as hurt#Chinese: without the quote it might take less than two minutes, but with the quote it took at least 15 minutes for me. I had to manually transcribe what was said in the film (which is different from the subtitles since subtitles are always in Chinese not Cantonese, and the audio quality isn't that good) and repeat the above process, not to mention actually finding the source, which also takes up a lot of time.

Nevertheless, I do see the benefits brought by requiring sources when creating new entries, which might be taking less time when compared to an RFV, but I don't think it is worth it for every single entry, perhaps only for the ones that are likely to be nominated for RFV. Instead, I would suggest that this should be something voluntary and recommended, but not mandatory. – Wpi31 (talk) 17:47, 3 September 2022 (UTC)[reply]

This isn't directly an impediment to the proposal, and I'm wary of saying it at all because w:WP:BEANS, but you may recall that a few months ago a user added a word with fake citations, which took a while to uncover. (And fake cites can be hard to uncover, since not all books are digitized, so just being unable to find a cite in Google Books doesn't necessarily mean it's not a real cite; someone found a real cite of Thing that wasn't findable online by happening to be reading the book at the time¹.) Right now, because people can add words without needing to add cites, the pressure to fake cites is low(er), and most cites are added by contributors who've been around a while, who are presumably less likely to add fakes. If we require every new user who wants to add a word to also include a cite, not only are we likely to get a lot of crappy/unusable cites of random webpages (though what is so crappy as to be unusable is less clear these days), but the pressure/benefit to add fake cites goes up.
I'm also not sure the benefit is worth the extra burden it puts on contributors. I generally add cites whenever I'm adding an obscure word, or one likely to be challenged, I often add them in other cases too, but having to always add one even for common and obvious words would be more tedious. Meh. - -sche (discuss) 20:39, 3 September 2022 (UTC)[reply]

Sure. I get it. I still repeat, this feels like a different issue that I would love to hash out. It's just not part of this discussion. Vininn126 (talk) 20:42, 3 September 2022 (UTC)[reply]

Based on all the other feedback, my opinion is solidifying into a firmer "oppose"; this adds a burden on good-faith contributors, doesn't impede either inept or bad-faith crappy contributors who we already see just paste whatever reference templates were on the page they were copying as a model into their new entry without checking whether the reference has the word they're now creating, and makes it harder for both editors and readers to spot entries that are suspect or need improvement because it gives everything the veneer of being referenced whether it is or not. "Add a cite or reference when adding an entry" as an ideal for regular editors to aspire to? Sure. But as a rule to require in all cases? No. - -sche (discuss) 19:30, 4 September 2022 (UTC)[reply]

I'll probably be repeating what most of the people have already said, still, I disagree. I enjoy references and quotes very much, I add them wherever I can. Taking time to make an entry, and prioritizing quality over quantity is a good idea, I believe, though I read some prefer being fast. What I most certainly believe is a bad idea though, is forcing people to put refs at the end of a page, because, since reference templates can be a thing, you can just slap some of them under your page and pretend you did your research. Compare uni#Italian which contains literally all of the Italian ref templates, and not a single one of them actually links to the intended word. That's an extreme example, but it is very common to see ref templates linking to nowhere, and even more common to see ref templates linking somewhere that gives way more information than we are displaying, clearly showing that the presumed source hasn't actually been used as a reference. I only add ref templates if I actually got the information from there, if not, they should be under Further reading. When I see a refless page, I immediately understand 'Oh this needs work', while if every page has refs, I would need to check them everytime to see if they're serious or not. This doesn't seem like it will make better the rfv practices, and (maybe an exaggeration) might even make them slower. Anyone could still make up any (i.e.) Italian word they want, nothing stopping them. Just the additional step of having to type {{R:it:Trec}} at the end of it. Catonif (talk) 22:09, 3 September 2022 (UTC)[reply]

I think this is a deeper issue with a LOT of nuance that should be discussed. Vininn126 (talk) 22:34, 3 September 2022 (UTC)[reply]

It's an issue that will grow tenfold if this goes through. Also I wouldn't want it to be hard for newcomers to make a page. Catonif (talk) 10:49, 4 September 2022 (UTC)[reply]

I agree with your original point, but I definitely think that it's a good idea to teach newcomers to use references from the start rather than have them make hundreds of stub entries that nobody can use. Thadh (talk) 12:28, 4 September 2022 (UTC)[reply]

Just on your point about sources that contain a lot more information than we're showing - I'm guilty of this on occasion, but it doesn't necessarily mean the source wasn't used. In adding Mongolian terms, sometimes I really don't want to spend the time adding 12 senses (10 of which are very niche), when the important thing for the language right at the moment is to get the main senses down. Theknightwho (talk) 17:47, 4 September 2022 (UTC)[reply]

So the priority is to churn out as much as possible ignoring the corner cases? I'm not sure I agree to that. Vininn126 (talk) 17:56, 4 September 2022 (UTC)[reply]

No. The priority is to, well, prioritise. I would much rather a smaller language had wider coverage that made it somewhat useable, than deep coverage over a much smaller number of lemmas. Theknightwho (talk) 03:42, 5 September 2022 (UTC)[reply]

@Theknightwho: You're right, it doesn't clearly show that the source wasn't used, it just hints towards it. And you're also right that it is not always the priority to list every possible sense. Catonif (talk) 12:20, 5 September 2022 (UTC)[reply]

I would support this move with a few caveats: we’d need to have a list for each (major) language where folks can easily find quotes & references (some languages already do this I feel), and then also, we’d need more active editors in general. I try to do this as much as I can with words like ᄒᆞ다 (hawda), but as seen with that entry, it takes time. (And there’s also the culture issue that’s a separate but related topic) AG202 (talk) 23:20, 3 September 2022 (UTC)[reply]

This is something I've been thinking about. Many print dictionaries have bibliographies out of necessity. We do at times as well. While I don't think we should create templates of each journal/book whatever, we should at least make them for more prevalent. We could have a page where we save them - perhaps language considerations or a separate one. Vininn126 (talk) 23:25, 3 September 2022 (UTC)[reply]

Such templates are often saved in categories such as "<lang> quotation templates" and, especially for dictionaries, "<lang> reference templates". In principle there's also the system of Module:Quotations, but I'm not having much joy in making it work for me with collective works such as translations of the Bible. (The Bible should have the advantage of there being a public domain English translation, without relying on the USA's legalisation of piracy of the Authorised Version.) --RichardW57m (talk) 13:49, 5 September 2022 (UTC)[reply]

I oppose this proposal because I think it will be an undue burden on new users, who often have a hard enough time using the site as it is, and also on highly active users who would not find it difficult, but might find it tiresome. —Soap— 17:54, 4 September 2022 (UTC)[reply]

The problem is that most people are not lexicographers. So the premise of the project is already very exclusionary. I think trying to invite people to edit who don't even know the difference between prescriptavism and descriptivism is a very bad idea. Vininn126 (talk) 17:59, 4 September 2022 (UTC)[reply]

"I think it will be an undue burden on new users"-- I agree with this. You've give them a taste of the action, even if they can only do limited work. For instance, I don't want to discourage someone making a slightly malformed page like Xiahuayuan, which I can then fix (see the Edit History). Also, there are words I'm only vaguely familiar with that are indeed words- Chinyang. I'm not in a position to do full cites on it or assess the one cite I did put on there. --Geographyinitiative (talk) 12:53, 5 September 2022 (UTC)[reply]

I oppose making the inclusion of a reference or citation mandatory based on the points made above about it providing additional burden for careful good-faith editors without posing much of an obstacle for bad-faith or careless editors who can simply misuse reference templates to create fake reference links.--Urszag (talk) 22:26, 4 September 2022 (UTC)[reply]

It would run counter to my practice when adding synchronously derivable Pali words. (Pali is an LDL.) I will furnish a quotation for the word I am intending to add, but for the words from which it is derived I do not struggle to find a quotation. Usually in principle I can find references to the word in texts from the Pali Text Society dictionary, but different citable versions have different numbering systems, and then providing a translation without breaching copyright is another major effort. If furnishing a quotation becomes necessary, I will be strongly tempted to simply omit the translation. Another solution will be to simply leave the immediate source(s) of the word as red links - or even as default blue (optionally orange for logged in users) misdirections. I don't think these labour-saving tricks will improve my contributions. As I recall, dictionaries are not admissible evidence for meanings. --RichardW57m (talk) 11:07, 5 September 2022 (UTC)[reply]

By the way - unrelated to this - but dictionaries are admissible for LDLs, as long as the community of editors in that language agrees that they are: "the community of editors for that language should maintain a list of materials deemed appropriate as the only sources for entries based on a single mention," (WT:CFI#Number of citations), in practice, this list is often not written down and just agreed upon. Thadh (talk) 12:53, 5 September 2022 (UTC)[reply]

What's a 'community of editors' in a language? I see no active mechanism for contacting or joining such a community. (Although {{wgping}} seems set up for such a function, it's not actively used.) --RichardW57m (talk) 13:25, 5 September 2022 (UTC)[reply]

Community of editors is just a fancy name for "people that edit the language". There's no way to officially define it yet. Thadh (talk) 13:25, 5 September 2022 (UTC)[reply]

It would make sense to require some evidence for the existence of a word. Perhaps we could accept a dictionary entry, even though it might not be valid for an RfV challenge. --RichardW57m (talk) 11:07, 5 September 2022 (UTC)[reply]

That's what my original proposal was (hence reference, quote or mention). But it seems even that might be a problem for many editors. Thadh (talk) 12:47, 5 September 2022 (UTC)[reply]

OK, that's tolerable. --RichardW57m (talk) 13:26, 5 September 2022 (UTC)[reply]

For inflected forms etc, would a link back to the lemma suffice? I have three cases particularly worthy of inclusion in mind:

Inflected forms in headwords of lemmas. For example, in Welsh wyth ar ddeugain, ddeugain is the soft mutation of deugain, and I would rather say that the lemma derives from wyth, ar and deugain. It seems superfluous to supply a quotation etc. for ddeugain, which is already linked to from deugain and links back to it in its definition.
Alternative citation forms. For Pali nouns and adjectives, we make the stem the lemma, but alternative traditions make the nominative singular (masculine) or, in the case of nouns in -tar, the genitive/dative singular in -tu.
Homographs of other terms.

For languages with multiple writing systems, there are also the subsidiary lemmas that are homographs of other terms. For these, it should suffice, for creating though perhaps not for retaining, to have a link to (and preferably back) from the main lemma. (There may also be language-specific restrictions.)

I would describe all these terms as subsidiary forms. --RichardW57 (talk) 03:12, 7 September 2022 (UTC)[reply]

I have always understood that existence of inflected forms are to be taken as evidence for a lemma. Vininn126 (talk) 08:19, 7 September 2022 (UTC)[reply]

This discussion was just for lemmas, since inflected forms don't need to be verifiable if they are regular. Thadh (talk) 08:44, 7 September 2022 (UTC)[reply]

I can't see the restriction to lemmas. Do we even have a mechanism for challenging irregular forms until they have their own entries? Even systems of alleged regular inflections may need challenging - Pali grammars conflict on the rarer parts of the system. Welsh plurals, Arabic masculine plurals and Latin 3rd conjugation perfects notoriously lack regular forms. --RichardW57m (talk) 11:35, 8 September 2022 (UTC)[reply]

I would be happy to give this proposal an unconditional support in a modified form: 1) If an authoritative reference exists, it shall be provided. This is easy to do, not laborious at all. 2) If the form is a fairly transparent closed compound that is in Google Ngram Viewer, GNV can be provided and this is sufficient. 3) If only attesting quotations in use exist to support the entry, they can be represented in an abbreviated form using some template, which is only to provide the author and the year or the title and the year. This would not be too laborious to enter, and would show the reader what kind of evidence we have used. It would be additional work, but not too much. The abbreviated forms would then later be expanded by whomever finds it worthwhile. --Dan Polansky (talk) 08:37, 7 September 2022 (UTC)[reply]

I would be fine with that. Again, the whole point of this proposal was to give our readers some - however small - proof that we didn't pull the entry out of our arses and to give them anything to go by if they want to verify the entry. Thadh (talk) 08:45, 7 September 2022 (UTC)[reply]

I see your purpose and support it, as long as it does not become too laborious. If the above or something similar gets approved, this will have been a very good initiative. People objected on the grounds that editors will have it easy to fake evidence, and I do not have a good response to that; when one only gives the author and the year, it is much harder to search than with an example sentence. One thing is for sure: if there was a community-approved template via which I can provide authors and years without quoting the passage, I would be happy to use the template without being forced to do so. If these concerns prevail, the policy could use the language "Editors are encouraged to do X", and that would still be an improvement over what we have since I would then be able to "encourage" editors on their talk page without being accused of impropriety. There would be a template for me to substitute on the editor's talk page, containing a refined polite request doing the encouragement to provide sources. As weak as it may seem, it would be progress. --Dan Polansky (talk) 09:08, 7 September 2022 (UTC)[reply]

There is {{rfquotek}} serving a similar purpose, but it does not allow giving the year, and it expects the author to be a key into a dictionary of Webster 1913 authors, which we do not want. I don't know what "k" at the end stands for. The template could be called {{quote-abbr}} and be used like {{quote-abbr|en|Jeremy Bentham|1890}}, {{quote-abbr|en|Jeremy Bentham|1890|author2=J. S. Mill}} and {{quote-abbr|en|1890|title=Treatise Concerning Things of Great Utility}}. Other names coming to mind are {{quote-stub}} and {{quote-incompl}}. And of course, we could follow the late trend of short template names and call it {{qa}}, {{qs}} or {{qi}}. --Dan Polansky (talk) 10:02, 7 September 2022 (UTC)[reply]

Ease of faking: The proposed evidence is very easy to fake, but Wikipedia's printed references do not fare much better: most readers will not have access to the referenced printed reference works and will not be able to verify that the statement traced to an inline reference is really supported by the source. And even if they have the source, there is often no passage or not even page number, so it is laborious to conclusively show that a given statement is not supported by the given reference. We should require {{qa}} to be pointing to something that is online, or else it will be hard to remove when an online search finds nothing promising. --Dan Polansky (talk) 10:14, 7 September 2022 (UTC)[reply]

I think restricting us to online sources is a bad idea. Vininn126 (talk) 10:19, 7 September 2022 (UTC)[reply]

Restricting {{qa}}, not us in general, since it provides so little identification. {{quote-book}} can use offline sources. But if people want to allow offline sources for {{qa}}, I will not oppose because of that, I just think it unwise. --Dan Polansky (talk) 10:23, 7 September 2022 (UTC)[reply]

Is this intended only to apply to entries (lemma L2 headers, actually), rather than to each etymology and each definition? In English, increasingly, marginal and spurious definitions for well-attested words are being added. Nothing short of attestation really addresses this problem in a readily monitorable way. DCDuring (talk) 13:50, 7 September 2022 (UTC)[reply]

Yes, this was intended for when one is creating an L2 lemma entry. It seems like a good idea however to just add a quote to strange definitions that most people wouldn't know regardless. Thadh (talk) 14:19, 7 September 2022 (UTC)[reply]

Is the proposal that an attesting quote be mandatory for each added definition? Would a footnote to a reference be sufficient? DCDuring (talk) 15:09, 7 September 2022 (UTC)[reply]

Again, the original proposal was adding either a quote to one definition or a reference to the entire entry. Is the "good idea" part of the current version of the proposal? Thadh (talk) 15:14, 7 September 2022 (UTC)[reply]

The original proposal was for an entry. Whether it was subsequently modified, I could not tell since the discussion is TLDR. The word definition only came up in these last few comments. DCDuring (talk) 16:14, 7 September 2022 (UTC)[reply]

Yeah, the word definition thing was just me saying it's generally a good idea to add sources to definitions as well, if it might be difficult for readers to find any verification. Thadh (talk) 16:20, 7 September 2022 (UTC)[reply]

Belated two cents --

I am not happy with the idea that an entry would require inclusion of a reference right from the get-go, as a necessary component of entry creation. As others have noted above, gathering and formatting references can be laborious. This process is somewhat similar in my mind to the process of gathering and formatting quotations. Some of our editors are very adept at that process, and seem to really enjoy doing so, as evidenced by participation in RFV threads.

I am very happy with the idea that an entry should have references as a general matter of style and entry structure.

‑‑ Eiríkr Útlendi │^{Tala við mig} 16:56, 7 September 2022 (UTC)[reply]

Though I sometimes enjoy the hunt for quotes, I usually do not. I view it more as a duty, which I sometimes neglect. If more contributors viewed it as a duty, this mandate would probably not have been proposed. DCDuring (talk) 17:26, 7 September 2022 (UTC)[reply]

Personally it would be a lot easier if I didn't have to provide a translation, but in general I can't really get behind that policy in the long run. Vininn126 (talk) 18:02, 7 September 2022 (UTC)[reply]

What do you mean by "that policy"?

It would be easier for you, but it does defeat the purpose of helping those who know more English than the language of the passage to be translated. Helping such people is, after all, what the justification for what we do here, isn't it? DCDuring (talk) 23:35, 7 September 2022 (UTC)[reply]

"That policy" of not adding translations.~And the reasoning you provided is exactly why I can't get behind it. Vininn126 (talk) 07:47, 8 September 2022 (UTC)[reply]

@Vininn126: Are you saying that you don't think our purpose is to help "those who know more English than the language of the passage to be translated". DCDuring (talk) 14:15, 8 September 2022 (UTC)[reply]

What? I was saying that "while it would be much easier to just not translate, I believe we should". Vininn126 (talk) 14:16, 8 September 2022 (UTC)[reply]

Just view a quotation without a translation as a lot better than nothing. During the course of the next thirty years or so, it is likely that someone will add the translation, or replace the quotation with a better or more translatable one. --RichardW57m (talk) 10:50, 8 September 2022 (UTC)[reply]

I've been simulating this by providing the first attestation as a citation without a translation. Vininn126 (talk) 11:04, 8 September 2022 (UTC)[reply]

Main space vs. other namespaces contributions

I would like for us to introduce a rule by which editors would be forced to keep the ratio between their contributions to the main space and their total contributions at a certain level for them to be able to post in other namespaces. (What do I mean by "post"? I'd say asking questions about words is fine, but pretending to give one's opinion and have an impact on our policies without doing much useful work oneself isn't right.)

This would drive away "policy makers" and other prattlers who don't actively engage in expanding our coverage and actually improving the dictionary.

I already see an obvious way of gaming the system: make a few cosmetic edits to main space entries in a row, and there you are, your ratio is maintained and you can keep blathering away on our various talk pages. That's why the main space contributions would have to be substantial: for example, I would suggest taking into account the number of entries created for words that unobjectionably belong here. (By "unobjectionable", I mean "core vocabulary that no person in their right mind would ever think of excluding".)

Disclaimer: I readily acknowledge creating new entries about basic words is not the only way of doing useful work here, but I think it's a pretty good metric/indicator. Maybe not a necessary condition, but certainly a sufficient one.

P U C – 21:18, 3 September 2022 (UTC)[reply]

@PUC So, if a certain Thai user creates enough bogus verlan entries, we should give them preference over someone who works in difficult languages that require extensive research for every edit? Chuck Entz (talk) 22:09, 3 September 2022 (UTC)[reply]

@Chuck Entz: I spoke of unobjectionable entries above, and a bogus entry is hardly that, so no. I'll admit that "unobjectionable" is left undefined, but everybody will agree that unobjectionable entries do exist (monomorphemic words such as dog would be a good start).

I'm aware such a system would require a good deal of discussion and flexibility to ensure that it's fair and doesn't exclude contributors who do deserve to have their seat at the table from discussion pages. I still think it could be an improvement on the current state of affairs. P U C – 22:22, 3 September 2022 (UTC)[reply]

@Chuck Entz I think this is a(nother) situation where good faith is called for. I'd apply a rule like this to a common sense test. Theknightwho (talk) 22:23, 3 September 2022 (UTC)[reply]

I was exaggerating to make my point. The truth is that there's a certain type of user who systematically creates entries for everything that doesn't get out of the way fast enough, and does it in huge volumes. Some of them manage to avoid obvious mistakes, in spite of knowing nothing on the subject matter. There's definitely a place for such editors, but the volume of their edits doesn't make them any more worthy of participating in discussions. I would contend that some people are too focused on edit counts alone, and I don't want to encourage it.

I also have trouble coming up with examples of more than a couple of main-space-slacking forum hogs- some of the most annoying recent discussions have involved people with substantial mainspace contributions, and there are lots of subject-matter experts we call on whose contributions are mostly elsewhere.

The main problem, though, is that a test like you're proposing requires someone to look through lots and lots of edits, which strikes me as a waste of time. Chuck Entz (talk) 22:52, 3 September 2022 (UTC)[reply]

In short we need to be able to look at an editor's 1) reasoning 2) knowledgeability and from THAT as a community be able to assign weight to arguments. It's very... unscientific/imprecise unfortunately. Vininn126 (talk) 23:03, 3 September 2022 (UTC)[reply]

You know, I agree, even as someone who does exactly that (i.e. 1) "systematically creat[ing] entries for everything that doesn't get out of the way fast enough" (me creating dozens of Armenian entries even though I don't speak a word of it) and 2) "[being] too focused on edit counts alone" (me obsessing over the number of entries I've created).) P U C – 23:12, 3 September 2022 (UTC)[reply]

We are often answering questions outside mainspace, so in general much of what is outside namespace is preparatory for work in the mainspace. Some editors have also been scolded for not asking before implementing changes, though their useful edits to modules would neither count towards the mainspace ratio. Forced is a strong word, I think PUC is writing satire here to something above; or a witty remark about something desirable that is not possible, driving attention towards an ideal or its distinctness from the feasible. Fay Freak (talk) 22:39, 3 September 2022 (UTC)[reply]

I appreciate that the gripe here is that some editors put in more useful work than others and that we should recognize them. (I want to die after my work at ackja.) I am unsure what is the best method for this. Currently we recognize certain users in a very unofficial way. This also ties in with the above BP discussion about FRD and the weight of arguments. I think it will take a feat of genius to think up of a way to "weigh/weight" certain editors opinions over others. Vininn126 (talk) 22:45, 3 September 2022 (UTC)[reply]

You probably mean akcja :-p P U C – 23:00, 3 September 2022 (UTC)[reply]

I spent more than 4 hours on that, I've earned my typo. Vininn126 (talk) 23:02, 3 September 2022 (UTC)[reply]

Ideally, this would be a cultural norm rather than a hard-and-fast rule, since it requires judgement to enforce the spirit of it whereas hard-and-fast rules can be gamed. But since cultural norms are harder to enforce than rules, and Wikipedia does in practice get observable value out of protecting certain pages and even talk pages against being edited by people with less than 500 edits / 30 days of activity as a hard-and-fast protection setting, I'm not saying a rule would be useless. But we do, so far, seem to more often have specific problematic editors (who could be blocked) rather than the sorts of organized harassment campaigns Wikipedia has needed to protect pages against. - -sche (discuss) 00:14, 4 September 2022 (UTC)[reply]

I remember once seeing a user be blocked on Wikipedia for using the site as a social network. He was quite young and perhaps simply didn't have a lot to offer in basic content editing. Also from Wikipedia was the phrase "voting-only account" for people with little interest in mainspace but a lot of strong opinions such that they were comparable to vandals. I would support having this type of block as an option here too, but I hope it would be very rarely used and that we should not need to measure a person's behavior in terms of numbers. —Soap— 21:51, 4 September 2022 (UTC)[reply]

The effort to quantify this seems doomed to failure without a great deal of effort. It does not seem worth the effort. I hope we can manage to find good reason, acceptable to a supermajority here and defensible to outsiders, to block someone who seems to violate vague behavioral norms without having to legislate. DCDuring (talk) 01:11, 5 September 2022 (UTC)[reply]

I am probably guilty of recently discussing a lot without having 500 edits during last 30 days in the mainspace, but I would have thought all my previous contribution to mainspace, the thesaurus and elsewhere counts for something. If the unspoken cultural norm is that the previous contribution does not count and that a ratio has to be maintained on a floating 30-day basis or something, please let me know, and an informal guideline can be adopted to that effect, without being mathematically enforceable. It also seems to me that the OP is discounting the value of policy work; many are able and willing to do mainspace work but not all that many are able and willing to make passable policy proposals and back them with sound reasoning and evidence. I am one of the few people who had some success at designing passing votes and policy changes, including WT:THUB; more are at User:Dan Polansky/Votes created. I also think that my participation in RFD is valuable: unlike many others, I always try to provide specific reasoning, maybe too much for the taste of some; RFD could be more speedily administered if more people participated in it, and I think people should be encouraged to participate more in RFD discussions. --Dan Polansky (talk) 06:46, 7 September 2022 (UTC)[reply]

`{{surf}}` shouldn't categorize

It's misleading for e.g. subjugation to be included in Category:English terms suffixed with -ion. It's borrowed from Latin and does not come from *subjugate + -ion. I'm not opposed to including surface analyses, but I thought we wanted to distinguish affixed words from words that start or end with a certain string of letters; otherwise the categories are just wrong. Ultimateria (talk) 21:09, 4 September 2022 (UTC)[reply]

@Ultimateria: It is an English word containing the suffix -ion (unless you want to separate out words with the suffix -ation. You seem to be suggesting a separate category for words containing said morphemes. --RichardW57m (talk) 11:22, 5 September 2022 (UTC)[reply]

@RichardW57m: I don't think subjugation should be in any category for ending in -ion or -ation, because it was not suffixed in English. It doesn't "contain the suffix -ion", it just ends with those letters. Ultimateria (talk) 20:41, 5 September 2022 (UTC)[reply]

@Rua recategorised {{suffix}} from the etymology templates to the morphology templates on 24 July 2014. Any sane morphemic analysis of the word will find one of those suffixes in subjugation, whereas neither should be found in cation. Additionally, note that subjugation can be, and I'm sure often is, regenerated from subjugate in English. --RichardW57m (talk) 09:40, 6 September 2022 (UTC)[reply]

One place where the categorization might be useful are internationalisms like genetyka. Vininn126 (talk) 12:59, 5 September 2022 (UTC)[reply]

@Vininn126: Hmm, that seems fine to me despite the derivation being unclear. But I don't think the majority of uses come from that situation. Also, more broadly speaking, I acknowledge there's a gray area with terms derived from modern languages. One could argue that Spanish campeón, which is borrowed from Italian campione, is suffixed with -ón because the endings are analogous but distinct. In these cases I'm willing to leave the categories untouched. Ultimateria (talk) 20:41, 5 September 2022 (UTC)[reply]

Me, Thadh, and Surjeciton were having a big discussion on similar kinds of things where the borrowing is somehow adapted. Sadly if we want that nuance it might be more difficult to make sweeping changes. Vininn126 (talk) 20:44, 5 September 2022 (UTC)[reply]

I agree with Ultimateria. I also want to draw attention to the fact that many entries are manual categorized into "terms suffixed with -X" when they merely end in -X without them being derived using -X. I think this is wrong but judging by how widespread this practice is, many editors seem to disagree. — Fytcha〈 T | L | C 〉 16:03, 6 September 2022 (UTC)[reply]

I agree that not every word beginning with a sequence should be categoriezed - however there are cases where it's useful. ANother example (along with my above one) would be rocznik. The categorization is useful for uncertain situations. Vininn126 (talk) 16:07, 6 September 2022 (UTC)[reply]

I think it should categorize, especially for terms inherited rather than borrowed from an ancestor term, which is not the case for subjugation. Morphologically, the suffix is there even if it was not the method of production. One may claim the term has inherited the suffix, and there will be a corresponding ancestor suffix in the ancestor term. Whether the borrowed terms are a different case is not so clear. English -ion is descended from Latin -io, so here it fits. -ion entry contains two definition lines marked "non-productive", and this is exactly what we are talking about. If we believe the -ion entry that the suffix is never productive, it would follow CAT:English terms suffixed with -ion would be empty or near-empty. -ion is defined in Merriam-Webster[1]. The surface analysis is not based merely on presence of a substring; it is a morphological analysis. --Dan Polansky (talk) 15:54, 7 September 2022 (UTC)[reply]

RFD header - abandoning text implying plain majority

Please comment on the following change of Wiktionary:Requests for deletion/Header:

Old:

If there is sufficient discussion, but a decision cannot be reached because editors are evenly split between two options, the request can be closed as “no consensus”, in which case the status quo is maintained.

New:

If there is sufficient discussion, but a decision cannot be reached because there is no consensus, the request should be closed as “no consensus”, in which case the status quo is maintained.

I highlighted the changed parts in boldface. The problem with the old text is that it implies that a plain majority suffices as consensus, which is not what the header said for over a decade, and which was not our practice, and I hope still isn't. Furthermore, the old text implies that editor votes should be tallied, which multiple editors opposed in previous discussions and required that strength of argument should play a role. The new text matches approximately what was in the header for over a decade and it does not prejudge what "consensus" means in any way, merely stating that it is required. That is only fair and takes no sides in the unresolved debate about how to close RFD nominations. The can to should change is minor and should be obvious. Thank you. Dan Polansky (talk) 10:20, 5 September 2022 (UTC)[reply]

Largish SoP Numerals

If a multi-word phrase is only used to denote a number larger than 100, and cannot appeal to WT:COALMINE, is it necessarily living on borrowed time? WT:CFI#Numbers,_numerals,_and_ordinals seems to imply so. If it is a synonym of a word that does meet CFI, and the phrase is the commoner way of expressing the number, how does one avoid creating a permanent red link to the phrase when listing synonyms? Not listing it seems wrong.

I've come across a Welsh number chwegain whose primary meaning is '120', whose spelt out synonyms are cant ac ugain and cant dau ddeg, which appear to be prohibited as lemmas if they only have the meaning '120'. The cardinal chwegain seems to be obsolescent - its main meaning seems to have become '50 pence', i.e. '120 pre-decimalisation pence' (shades of Welsh grôt)! --RichardW57m (talk) 12:13, 5 September 2022 (UTC)[reply]

It might be worth putting the individual words in [[ ]] (or however you best think it should be subdivided), so it's listed as a synonym but doesn't go to a redlink. Theknightwho (talk) 15:02, 5 September 2022 (UTC)[reply]

What I met is a by-form of chweugain, which already has those synonym problems. --RichardW57 (talk) 21:53, 5 September 2022 (UTC)[reply]

Should words with circumfixes also have the categories for the corresponding prefix and suffix?

There are many circumfixes that can be broken down into a prefix and a suffix, for example ver- -en, ont- -en, be- -en, ge- -t; and the {{head|circumfix}} template used on the articles for these circumfixes does automatically link to the prefix and suffix they're made of. However, words that use the {{circumfix}} template only get the category for the circumfix. I think these words could just as well be analysed as having the prefix and suffix.

I recently added the German section for be- -en and the corresponding category Category:German_terms_circumfixed_with_be-_-en (such an article and category already existed for the same circumfix in Dutch) and wanted to add this category to the appropriate words, which includes many of the words currently in Category:German_terms_prefixed_with_be-, so I was looking through that category and changing them to use {{circumfix}}. They usually used a template like {{affix}} or {{confix}} before and I noticed that when I replaced it with {{circumfix}}, they would no longer be in the categories for the prefix and suffix. Whether {{affix}}, {{confix}} or {{circumfix}} is used, it looks the same to the user, and it seems pretty arbitrary to me to choose one of the two analyses as either having a circumfix or a combination of prefix and suffix, I think both are valid. As a user I would expect be- -en words to show up when I look through the German_terms_prefixed_with_be- or German_terms_suffixed_with_-en categories. Maybe the {{circumfix}} template should also add the corresponding prefix and suffix categories? Tajoshu (talk) 17:23, 5 September 2022 (UTC)[reply]

@Tajoshu: How are any of the above examples circumfixes? The ending part is inflectional, part of the infinitives that we happen to use as a citation forms of German verbs. When I clicked on Category:German terms circumfixed with be- -en I hoped that it would contain beschissen but it doesn’t, though even that is questionable in the afterthought.

So the greater problem is not that they can be analysed being prefix plus suffix but that they can be seen as one part derivational and one part inflectional. While inflectional circumfixes exist, should the concept of a circumfix that is both inflectional and derivational be ascribed reality? And then, we don’t categorize inflections by their affixes employed anyhow. @Fytcha, Mahagaja, I suggest you to exercise your discretion to take the appropriate measures.

Until you afford actual examples, to constitute discernible relevance in front of the ambitions of my intellect, I am not waker enough to deliberate the solution of your abstract question, and it may be the effective reason for others not answering. Fay Freak (talk) 19:29, 8 September 2022 (UTC)[reply]

Chinese etymology sections should not use zh

Chinese etymology sections should not use zh, especially for {{psm}} which the pronunciation of the Chinese lect makes a large part in choosing the characters. Majority of the current usage of zh are in fact Mandarin-only, which should be cmn instead, meanwhile Category:Mandarin terms derived from other languages is extremely underpopulated. Note that this usage results in nonsense such as Category:Chinese phono-semantic matchings from Cantonese. For orthographical loans such as {{wasei kango}}, the zh could remain, or we could use the inclusive zho instead.

In addition, in cases such as 麥當勞, Mandarin, Hakka, and Min Nan did not directly loaned the term from en, instead it was via Cantonese, which then loaned the orthography into Chinese, so it should be denoted by {{bor|yue|en|McDonald's}} and {{der|zh|yue|-}}. (ideally there should be a separate category/template for this type of borrowing for a word to first be borrowed from other languages and then becomes pan-Chinese, but I think this solution is better than what we are currently doing)

(Notifying Atitarev, Tooironic, Fish bowl, Justinrleung, Mar vin kaiser, RcAlex36, The dog2, Frigoris, 沈澄心, 恨国党非蠢即坏, Michael Ly, ND381): – Wpi31 (talk) 17:49, 6 September 2022 (UTC)[reply]

@Wpi31 What I currently do is that if a term is only used in one Chinese lect, say Cantonese, then it is {{bor|yue}}; otherwise I would use {{bor|zh}}. Are you suggesting that it should instead be the first language where the borrowing (would have) occurred? Sometimes it might be difficult to determine, like 紐約 for example. — justin(r)leung _{{ (t...) | c=› }} 19:01, 6 September 2022 (UTC)[reply]

Side note: I do not think PSM is particularly more dependent on the pronunciation than other borrowings. In fact, it probably is the other way around since the pronunciation is the only thing that other borrowings depend on essentially. — justin(r)leung _{{ (t...) | c=› }} 19:03, 6 September 2022 (UTC)[reply]

@justinrleung: Yes, that's exactly what I'm suggesting: it should be using {{bor|yue}} (if it's first in Cantonese), and then use {{der|yue|zh}}(or maybe {{obor|yue|zh}} if it makes sense). For terms where the original lect is unknown, such as 紐約 or 倫敦, they can keep it as is.

Regarding the part about PSM, I meant all the different types of phonetic borrowings. Since the word is phonetically borrowed, there should be a reference to the pronunciation (if it's identifiable), not just a nonspecific zh.

(please also read the next BP section, because I don't to bother people by mass-pinging twice in a short period of time.)

– Wpi31 (talk) 04:25, 7 September 2022 (UTC)[reply]

I'm a bit cautious about this when it comes to terms where I don't know how much lectal spread there is, or when it's uncertain at what period it entered Chinese at all. For example, I can be sure of the origin of 騰格里 (Ténggélǐ) without knowing which lect it entered first (probably Middle Chinese, but I don't know). Theknightwho (talk) 15:51, 7 September 2022 (UTC)[reply]

Categorisation (topics and labels) in Chinese

Currently we have Category:zh:All topics but no Category:cmn:All topics or Category:yue:All topics. (or Category:hak:All topics which exists but contains a category which in turn contains itself lol) Also, there are no separate categories for Category:Mandarin vulgarities or Category:Cantonese vulgarities, only Category:Chinese vulgarities, which lumps everything into one category. (likewise for other categories) This makes a user who is only interested in, say, Mandarin derogatory terms to be overwhelmed by words in other lects in Category:Chinese derogatory terms, which contains 1000+ entries, many of which aren't relevant. These issues rendered these categories with low to zero usability. Besides, the Chinese categories are sorted by radical, which isn't something most of the learners and online people familiar with, whereas splitting them by lect also allows sorting of words in an order based on the lect's phonology.

Therefore, I am proposing that we split the above categories by lect, while keeping the current Chinese category alongside the new ones. The {{cln|zh}} and {{topics|zh}}ones should be fairly easy to do, simply adding {{cln|cmn}}, {{cln|yue}}, {{cln|hak}}, {{cln|nan}}, {{cln|wuu}}, etc. should do the job.

For the categories generated by {{lb}}, I am thinking of a {{zh-lb}} which would sort pages with some trickery? For example, the second sense on 屌 could be something like {{zh-lb|c,h,p,Zhongshan Min, Guangxi Mandarin|vulgar}}. (Also, while we are at this, the not-so-ideal {{lb|zh|dialectal}} could be also cleaned up in this process)

(don't want to mass ping again, so I'll just hope everyone gets the ping from the previous section and reads this as well) Wpi31 (talk) 18:30, 6 September 2022 (UTC)[reply]

@Wpi31: This is definitely something that should be dealt with. While having a new template is probably easier, I wonder if it's possible to do with the current {{lb}} template. — justin(r)leung _{{ (t...) | c=› }} 04:44, 7 September 2022 (UTC)[reply]

I would rather we do this. The long-term goal is probably to integrate the other Chinese-specific templates into the main ones if at all possible. Fundamentally, any special features can be achieved by simply checking what the language-code is and implementing them - not so easy for the stuff that's already separate, but with new things we should definitely take that approach. Theknightwho (talk) 16:34, 7 September 2022 (UTC)[reply]

OED treatment of proper nouns

Let me drop the following investigation I made elsewhere here to BP for later reference. OED, a classic and very impressive dictionary, lacks surname Darwin, first names Martin and Paula, and the cities of London and New York; in New York, they define it as some kind of attributive adjective, mentioning New York only in etymology. From the New York entry, one would get the impression that OED avoids proper nouns and specific entities like a plague: they would rather include New York as an attributive adjective than being forced to admit New York is a city. But OED has Sirius (star), Mars (planet) and Milky Way; it has Homo sapiens; it has river Thames but not river Nile; they have entry Nile, but the river is only in etymology. OED Canada entry has the country only in etymology. OED Europe entry has the continent only in etymology and has European Union as the only sense. OED has no entries for Asia, Ontario and Germany. But OED has China as a country. OED Star Wars has the military defense strategy as the sole sense, having the franchise only in etymology. The selection criteria of OED for proper names remain elusive; it seems pretty chaotic and inconsistent. From our voted-on coverage of proper names (esp. geographic ones) it follows we do not plan to follow the OED in its exclusion of proper names and specific entities. Ontario is instructive: we have 14 specific entities covered. If someone want to have a look at more examples of what OED is doing and post the results here, that would be cool. --Dan Polansky (talk) 06:35, 7 September 2022 (UTC)[reply]

interface-editor group proposal

Such group (Επεξεργαστές της διεπαφής) is present in Greek wiktionary.

I need editinterface right to edit a page in MediaWiki namespace and i don't need sysop group. —Игорь Тълкачь (talk) 16:54, 7 September 2022 (UTC)[reply]

Is it really necessary to create a new role? We could imagine repurposing/renaming the interface administrator role to what you're suggesting, and creating a nomination process to that role as well.

@Chuck Entz, Surjection, are the roles of administrator and interface administrator bound in some way? Has an interface administrator all the prerogatives of an administrator? P U C – 17:20, 8 September 2022 (UTC)[reply]

They are historically. It used to be that only admins could edit interface pages until it was split off into a separate group. What is the problem here that makes it impractical to request edits to the appropriate pages or that which requires adding new user groups? — SURJECTION ^{/ T / C / L /} 19:18, 8 September 2022 (UTC)[reply]

Changes to WT:LEMMING

I made some changes to WT:LEMMING and was reverted. My changes are in diff. Thus, I propose:

1) Add "OED, AHD, Cambridge, Collins, Macmillan, Longman, German Duden and Spanish DRAE" as example dictionaries. These are the dictionaries that were listed in the lemming vote and these are the kind of dictionaries that are being mentioned in support of LEMMING. They match the definition of "general monolingual" dictionary already present in LEMMING, entered there by me based on the 2014 discussion some years ago.
2) Add Talk:George VI and Talk:Joan of Arc as examples where LEMMING was not followed. Thus, the reader will see at least part of the extent to which this is not actually applied. Seems very useful. More examples could be added as objective evidence of actual acceptance and rejection of the principle.
3) Add "Further discussions can be found from Special:WhatLinksHere/Wiktionary:LEMMING." This is useful for the reader who wants to know how far the principle has been invoked in discussions. Nothing wrong with that, from what I can see.
4) Add 'History: The principle arrived to this page via diff on 7 September 2007 in a different form: "Terms that have entries in other dictionaries, especially specialized ones." The principle proposed in 2014 was about general dictionaries, not specialized ones. The term "lemming test" occurred in a 2007 discussion at Talk:genuine issue of material fact.'
This is to honestly report that a) some form of the principle is as old as 2007, and b) that the principle originally specified in terms of "specialized" dictionaries. It is accurate and of historical interest. Discussions pre 2014 referencing the principle invoked the principle in that form. I see nothing wrong with that: again, accurate and interesting.

Please comment, and sorry for the bother. Dan Polansky (talk) 11:40, 8 September 2022 (UTC)[reply]

Revised Enforcement Draft Guidelines for the Universal Code of Conduct

You can find this message translated into additional languages on Meta-wiki.

Hello everyone, The Universal Code of Conduct Enforcement Guidelines Revisions committee is requesting comments regarding the Revised Enforcement Draft Guidelines for the Universal Code of Conduct (UCoC). This review period will be open from 8 September 2022 until 8 October 2022. The Committee collaborated to revise these draft guidelines based on input gathered from the community discussion period from May through July, as well as the community vote that concluded in March 2022. The revisions are focused on the following four areas: To identify the type, purpose, and applicability of the UCoC training; To simplify the language for more accessible translation and comprehension by non-experts; To explore the concept of affirmation, including its pros and cons; To review the balancing of the privacy of the accuser and the accused The Committee requests comments and suggestions about these revisions by 8 October 2022. From there, the Revisions Committee anticipates further revising the guidelines based on community input. Find the Revised Guidelines on Meta, and a comparison page in some languages. Everyone may share comments in a number of places. Facilitators welcome comments in any language on the Revisions Guideline Talk Page. Comments can also be shared on talk pages of translations, at local discussions, or during conversation hours. There are planned live discussions about the UCoC enforcement draft guidelines; please see Meta times and details: Conversation hours The facilitation team supporting this review period hopes to reach a large number of communities. If you do not see a conversation happening in your community, please organize a discussion. Facilitators can assist you in setting up the conversations. Discussions will be summarized and presented to the drafting committee every two weeks. The summaries will be published here.

On behalf of the T&S Policy Team Mervat (WMF) (talk) 11:11, 9 September 2022 (UTC)[reply]