Why there aren't 40,000 newsgroups

This is an article I posted to alt.binaries.news-server-comparison in December, 1998, during a discussion of active file sizes. I compared the contents of the Supernews active file with that of Onlynews, because that provider was mentioned in the thread as having a large newsgroup list.

It is worth noting that, after I posted this, a news administrator from Onlynews contacted me and asked for the full list of groups that were missing from their list (which I provided), and said that they were going to address the problem. The specific data in this article is going to be out of date, but the general comparisons and conclusions remain valid.

The article is presented here as posted, except for the HTML markup. It is not archived in Dejanews, as they do not archive that newsgroup.


Newsgroups: alt.binaries.news-server-comparison
Subject: Re: Family Oriented Server
From: Jeremy <jeremy@exit109.com>
Date: 23 Dec 1998 07:22:47 GMT
Message-ID: <75q5o7$b94$1@east44.supernews.com>

DC <drcorley@a.crl.com> wrote:

> if the ISC isn't sufficient for the "alt.*" and others area specific groups
> then we have the current problem, where no matter how many junk groups you
> have, if you don't carry alt.such-and-such, no matter how obscure and
> lacking in traffic it may be, your server and service is viewed as
> something less than the server that does....

Okay, so I did some comparison. I grabbed the Onlynews active file from the webpage and compared it to mine. When I say "mine" I mean the active from Supernews/RemarQ, my working list from the East system (what the west system will look like after the next group sync, scheduled for tomorrow).

This is long, so if you're not interested in comparing news servers, and why this isn't a valid comparison, move along.

My active contains 31797 groups; Onlynews has 66405.

One might expect Onlynews to have all the groups I have, plus a lot of other crap. But there are about 698 groups missing from the Onlynews active file, as compared with mine. This is after removing our local groups, of course.

118 of the "missing" groups are in alt.*; 20 of these are alt.binaries groups, including alt.binaries.pictures.centerfolds.playboy, alt.binaries.pictures.suntan, and alt.binaries.sounds.mp3.complete_cd, which I picked out because I have received multiple user requests for them.

Perusing the other alt.* groups, I find of course some joke groups that don't really matter all that much, but I also see real, legitimate newsgroups there, some of which my users have asked me for, and some of which I have even posted to.

In addition, the regional and foreign-language hierarchies are well represented, including missing groups in at.*, bln.*, ch.*, cl.*, de.*, ee.*, fj.*, hun.*, it.*, japan.*, nl.*, no.*, and pa.*. I have run checkgroups for all of those recently, so I know I'm up to date. Onlynews is also a little slacking in the company groups, for example the redhat.* hierarchy, and there is a missing biz.* group as well. There are of course other ones in there in hierarchies I am not familiar with; I don't know how important they are.

Most interestingly, my script claims that the Onlynews active file does not contain news.announce.newgroups or news.announce.newusers. Amazing, I think, so I check by hand. It seems that their list does contain a group "news.announce.newgroups:" (note the colon) and the same for the other; can we chalk that up as an error in the file on the webpage, and assume Onlynews couldn't possibly be missing those groups?

This is what I mean about a large active file probably meaning a poorly-maintained one.

Now, on the other side, there are obviously a lot of groups that Onlynews has which I don't. What could they be? Let's take a sampling of what I'm missing:

(Yes, these really are "newsgroups" in the Onlynews active file, one per line)

$1
------------------
0.test
0000000
01alt.binaries.pictures.girlfriends
1.1.1.1
100622.2474
43
BMO2260.SBSHERIFF.ORG
Database
Hello
a.a.a
aaa
alt.0-011.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
alt.0-070.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
alt.0000a.this-site.newgroups.everything
alt.a.b.c.d.e.f.g.h.i.j.k.l.m.n.o.p.q.r.s.t.u.v.w.x.y.z

Okay, okay, I'll relax a bit. I just had to include a small sample of the list of crap; there is a lot more just like this. But there are a few other things I'd like to point out.

alt.anonimous.messages
alt.anontmous.messages
alt.anonymous-messages
alt.anonymous.message
alt.anonymous.messagess
alt.anonymous.messgaes

Er, will the real alt.anonymous.messages please stand up?

alt.bainaries.pictures.babies

They've got a whole huge list of misspellings of "binaries".

alt.barney.die.die.die
alt.barnie.die.die.die

I think we really only needed one of those. :)

Now, for the important stuff, there are quite a few alt.binaries groups (with "binaries" spelled right) missing from my active file. Among them:

alt.binaries.erocitca.cheerleaders
alt.binaries.0000020788rotica.pornstars
alt.binaries.dominion.bugger.snot.slime.ball
alt.binaries.games.alt.cracks
alt.binaries.pcitures.erotica.black.male
alt.binaries.pictures.eortica.dbg
alt.binaries.pictures.erotica.  (trailing dot)
alt.binaries.pictures.erotica.breats
alt.binaries.pictures.erotica.brests
alt.binaries.pictures.erotica.bymnasts-girls

Wow, at this rate I could be to 60k groups in no time!

We also have a bit of "active file cascading":

alt.desert.storm
alt.desert.storm.its
alt.desert.storm.its.not
alt.desert.storm.its.not.scud
alt.desert.storm.its.not.scud.its
alt.desert.storm.its.not.scud.its.al-hussein
alt.desert.storm.its.not.scud.its.al-hussein.dammit

alt.fan.tom-servo.i
alt.fan.tom-servo.i.am
alt.fan.tom-servo.i.am.so
alt.fan.tom-servo.i.am.so.cool
alt.fan.tom-servo.i.am.so.cool.i
alt.fan.tom-servo.i.am.so.cool.i.cant
alt.fan.tom-servo.i.am.so.cool.i.cant.believe
alt.fan.tom-servo.i.am.so.cool.i.cant.believe.myself
alt.fan.tom-servo.i.am.so.cool.i.cant.believe.myself.so
alt.fan.tom-servo.i.am.so.cool.i.cant.believe.myself.so.i
alt.fan.tom-servo.i.am.so.cool.i.cant.believe.myself.so.i.will
alt.fan.tom-servo.i.am.so.cool.i.cant.believe.myself.so.i.will.try
alt.fan.tom-servo.i.am.so.cool.i.cant.believe.myself.so.i.will.try.to
alt.fan.tom-servo.i.am.so.cool.i.cant.believe.myself.so.i.will.try.to.break-your-newsreader
alt.fan.tom-servo.i.am.so.cool.i.cant.believe.myself.so.i.will.try.to.break-your-newsreader.to

Wow, that's almost as fun as meowing.

They've also got the entire alt.fan.dean-stark.* hierarchy -- all 142 groups. I'm impressed, I haven't been able to keep up with it. Oh, and look -- all 65 of the alt.pedophile.* groups!

But let's move on from alt.*. It's a moving target, hardly a totally fair comparison.

They've got all the aol.neighborhood.* groups listed -- nice, except that AOL doesn't propagate them, making them useless and misleading.

There are a huge number of bogus groups in the regionals, as well. My god, the hierarchy maintainers must be pulling their hair out.

The entire news.admin.* bogus group flood is present. From news.admin.agriculture.fruit to news.admin.transport.urban-transit, including all the news.admin.pedophile.* groups. Repeated all over again for the news.groups.* flood, the news.pedophile.* flood, and the news.misc.* flood. What a mess. They ought to be ashamed.

Oh, and news.announce.meow, of course.

I find myself wondering whether NewsGuy lets Onlynews carry their local groups. Because they're all on the list too.

Finally, we have tons of University hierarchies. Ones that don't propagate outside their home servers, and thus are full of the occasional spam and a few people saying "hey, where is everyone?" I know, I've gone through and looked at a lot of them, and contacted the news admins at some of the schools and asked about them. The uiuc.* groups, for example, do not propagate outside UIUC.

Now, I admit, I found twenty or so alt.* groups that I think I'm going to have to add. Like I said, alt.* is a moving target. You got me there.

But what I see here is a huge, bloated mess of an active file that I'd hate to have to wade through as a user, never knowing whether that group I'm posting to is actually being seen by anyone else, never knowing what groups I'm missing out on in my regionals, etc.

So. There aren't 40,000+ newsgroups. That's my story, and I'm sticking to it.



Usenet providers