Skip to Content View Previous Reports

Content Anaylsis

Content Analysis

By the Project for Excellence in Journalism

How far has online journalism come?

To what degree, in other words, are news sites delivering on the promise of the Internet – providing news with interactivity and multimedia capability, updating with new information, and delivering a different kind of journalism? Or is Internet journalism, at least at the mainstream news sites that get most of the traffic, a dumping ground for yesterday’s copy and a place dominated by third-party wire copy?

To find answers, the Project conducted a content analysis of nine news Web sites, including the three most popular as measured by ratings.

Among the highlights:

The Project looked at a range of Web sites throughout each day of our study, rather than just once a day. We looked at nine news sites – two from cable television (CNN and Fox), two associated with broadcast television networks (ABCNews.com and MSNBC.com, which is affiliated with both MSNBC cable news and NBC), two Internet-only sites (Yahoo and AOL) and two newspaper sites (Washingtonpost.com for a large- circulation market and www.pantagraph.com of the Bloomington, Illinois, Pantagraph for a small-market newspaper). Finally, we analyzed www.cbs11tv.com, the Website of Dallas CBS 11, a local television Web site.

Altogether, 1,903 news articles were sampled for twenty weekdays scattered between January and October. We rotated four different download times: 9 a.m., 1 p.m., 5 p.m. and 9 p.m., sampling one for each day. For five of the twenty days, we downloaded articles on each site for all four time periods. The study examined all articles on the front page tied to a graphic image, plus the next top three articles. It also noted the multimedia links within each article, specifically photo galleries, video, and graphics.

Originality of Reporting

How much original reporting occurs online?

The data continue to suggest that Internet journalism, at least on the major news sites studied, is still largely second-hand material, usually from the old media.

The fact that little has changed in the last year may be a sign that less progress is being made on the content side of the Internet than on the economic side.

While the sample last year was slightly different,1 and some of the sites have changed (ABCNews.com replaced CBS.com), we can still observe some patterns.

The amount of effort put into updating or modifying wire copy with some original work has declined. The percentage of stories that were a combination of staff and wire dropped to just 9% from 23% a year earlier.

And the percentage of wire-service stories posted without any sign of editing rose, to 58% this year from 42% a year earlier.

Meanwhile, the percentage of original work remained the same as a year earlier (32% of all stories were bylined staff-written). And that original work was even more limited than before to just a few sites.

In other words, despite the migration of audience to the Web, there is no overall sign, at least at the sites studied, of any surge in originality of content. That seems to be reinforced by evidence of staff cutbacks and even more limited resources we found in examining Newsroom Investment.

Story Orignination, 2004
Percent of All Stories

Origin
Total
AOL
ABC
CNN
Dall-as
Fox
Bloom.
MSNBC
WPost
Yahoo
Staff
32%
1%
25%
54%
12%
14%
96%
13%
83%
1%
Wire & Staff
9
0
1
15
14
24
1
24
2
0
Wire
58
98
74
30
73
62
4
63
15
99
Other Org.
1
2
0
1
0
0
0
0
0
0
Totals may not equal 100 because of rounding.

Could the rise in wire copy be just a sign of our sample’s changing? The evidence suggests it is more than that. Much of the change occurred on the five sites that we studied both this year and last.

That was particularly true at the three cable sites studied. At CNN, for instance, the percentage of wire copy doubled, from 15% to 30%, while original work dropped from 75% a year earlier to 54% this year. CNN, however, is still clearly head and shoulders above the other cable sites in the percentage of original reporting.

We saw similar changes at MSNBC. The amount of straight wire copy rose from 23% to 63%, while the percentage of stories in which MSNBC’s staff made some attempt to customize or add to the wires fell by half, from 48% to 24%. Original stories fell from 17% to 13%.

Fox News changed the least. Its slight rise in wire copy was not statistically significant.

Sites still vary greatly, but one change from a year earlier is that the nature of the parent company’s original medium seems to be more tied, not less, to the kind of site it produces. Most notably, the two newspaper sites far outweigh the others in original reporting. Fully 96% of the Bloomington site, www.pantagraph.com, is original, suggesting a decision to stay local rather than carry wire or other more national stories. And 83% of the lead stories at Washingtonpost.com were original.

Broadcast-based sites, on the other hand, range from 54% original content on CNN to just 13% on MSNBC. And the local television site’s approach is quite different from that of the local newspaper. Just 12% of the content on cbs11tv.com is original reporting, with 73% wire.

A year earlier, we found that news sites fell into one of these three categories in producing lead stories on their front pages:

What appeared to be happening in 2004 was that the middle category was shrinking. The nine sites studied ran either their own stories or straight wire copy, with little editing of outside content. The lone exception was CNN, in which half the copy was original and most of the rest wire.

As was the case in 2003, the two aggregators, AOL and Yahoo, made no attempt to produce their own work on their lead breaking stories.

Does it matter where a site’s news comes from? The risk of relying predominantly on wire copy is that it means entrusting the accuracy of the copy to someone else. You have made no attempt to verify independently. The growing tendency this year to run wire without any kind of staff input or editing suggests even greater risk.

On the other hand, if we analyze the depth of reporting, particularly the sourcing, of the wire copy and the original copy, the data suggest little difference between them. Wire and staff copy had nearly equal levels of transparency of sources, range of viewpoints and number of stakeholders – all of which were quite high.

The issue, then, may be more a matter of repetitiveness. Many outlets end up carrying the same stories. The amount of original reporting is not growing. Attempts to do some editing and checking of wire copy are declining. If a story does prove erroneous, then, the spread of the error will be all the greater.

A Place for Continuous Updating and Follow-Up

Like cable television news, the Internet offers the ability to continuously update users with the latest turn of events.

One goal of the study was to determine how much new information news Web sites actually posted through the day. To do so, for five of the days studied, we checked every four hours to see what percentage of the lead stories were altogether new, what percentage were unchanged and what percentage were in some way updated.

Moreover, there are degrees of updating: Was there something substantively new to the stories, were just some minor details added, or was it a rewrite around a new angle?

Story Freshness
Percent of All Stories

Freshness
All Stories
2003
2004
Exact Repeat
21%
26%
Repeat: No New Substance
14
2
Repeat: New Angle
2
*
Repeat: New Substance
14
11
New Story
49
60
Totals may not equal 100 because of rounding.

What we found, generally, was a tendency toward posting more new stories and updating fewer running stories than a year earlier. This, too, may be a sign of sites adding new technology to process more copy, but then having less staff on hand to update major stories as new information becomes available. Machines, rather than journalists, may be defining the changing nature in online news at the moment.

The most striking change was that compared with a year earlier, the sites studied posted more new stories on completely different topics as the day wore on. This year, indeed, the majority of stories turned over (60%, up from 49% last year).2

Another quarter of the stories (26%) were left unchanged through the day, up slightly from 21% a year earlier.

Thus only 13% were stories that had some level of updating, half as many as a year earlier. There was one positive sign, however. The vast majority of those were adapted with substantive new information. That was markedly different from a year earlier, when it was just as likely that a story would be tweaked with only minor new details. So fewer stories were being updated, but of those few, more are being substantively or meaningfully updated than a year earlier.

Do the changes from year to year, particularly the rise in new stories, suggest that there is now a new news cycle on the Internet? And is the news updated continuously in an even flow? Or does it change sharply toward the end of the day, after the close of business but a good 10 or 12 hours before the morning newspaper arrives?

Based on the sites examined, which included the three most popular news sites on the Web, the Internet seems to have adopted more of a continuous news cycle than a year earlier.

Last year we found that the morning generally opens with new headlines and content. (For both years, all downloads and references to time are Eastern) As the day wears on, new stories are less and less likely to appear as leads. What sites did was update the original morning stories. The level of substantive updating increased as the day wore on.

That pattern no longer holds. Stories are still mostly new at 9 a.m. (and for the purposes of this study all are considered new). But now the sites studied are posting more new lead stories throughout the day. More than half of the 1 p.m. stories were new (55%) as were 48% of 5 P.M. stories and 42% of 9 p.m.

Story Freshness Throughout the Day

1 PM
5 PM
9 PM
Exact Replay
30%
33%
37%
Repeat: No New Substance
2
3
4
Repeat: New Angle
*
0
*
Repeat: New Substance
12
16
16
New Story
55
48
42
*All 9 AM stories considered “New.”
Totals may not equal 100 due to rounding.

The Web and Multimedia

To what extent do news sites take advantage of the Web’s capacity for depth and incorporating multiple media?

A year earlier we found that sites varied widely in this regard. Most of those studied contained links to background information but only some added multimedia.

This year, with input from online journalists, we wanted to look deeper. We decided to see the different types of links – video, audio, graphics and photos – and try to distinguish between those that were current, involving information less than a week old, and those that were more archival.

We also wanted to check for different levels of “interactivity.” Could users communicate with those operating the site (via e-mail, votes, or chat boards)? And did users have the ability to manipulate or tailor the content in some way?

Looking first at background and current multimedia links:

Multimedia Components, 2004
Percent of All Stories

Origin
Total
AOL
ABC
CNN
Dall-as
Fox
Bloom
MSNBC
WPost
Yahoo
VIDEO LINKS
Current
29%
24%
2%
65%
38%
27%
1%
46%
25%
37%
Past
5
1
1
4
1
12
0
3
20
2
Undetermined
2
6
1
4
3
4
0
1
3
3
PHOTO LINKS
Current
19%
12%
*
29%
11%
22%
1%
17%
27%
60%
Past
7
1
0
11
5
14
0
8
21
4
Undetermined
6
2
*
17
2
15
0
3
9
15
GRAPHIC LINKS
Current
13%
5%
1%
20%
12%
28%
2%
31%
17%
1%
Past
6
0
1
8
5
15
1
19
4
0
Undetermined
2
0
0
7
2
5
0
4
3
0

Next we tried to assess the level of interactivity connected to stories. Could users communicate with the site? Could they manipulate the material?

In general, online news still has a long way to go in incorporating either capacity into its content. Just a quarter of stories offered some kind of communication with the site (for example, emailing your comments to the reporter). Almost the same percentage, 24%, allowed users to manipulate or tailor the data to their own needs in some fashion, such as checking crime data for their own neighborhoods.

The sites that stand out for the ability of users to communicate with staff members about the story at hand are Yahoo! and AOL. Every single Yahoo story studied offered users the opportunity to “Rate and/or recommend” the story. On AOL’s news site, the vast majority of top news stories offer a similar exchange: “Chart or post a message.”

CNN, however, went a step further, offering customization of 90% of the content studied, often by allowing people to set up email alerts about a particular story. MSNBC connected some kind of customization to 42% of stories; Fox trailed at 32%.

Interactivity of Online News

Origin
Total
AOL
ABC
CNN
Dall-as
Fox
Bloom.
MSNBC
WPost
Yahoo
Communi-
cation
25%
85%
1%
14%
10%
3%
0
23%
12%
100%
Manipul-
ation
24%
6%
1%
90%
21%
32%
0
42%
9%
13%

If the Internet culture is really moving toward the “pro-sumers,” pro-active audiences who read and comment at the same time, or both consume and produce news, most news sites still have a long way to go.

Sourcing

How well sourced is Internet journalism? That discussion might begin with the question of how much audiences can tell about the sources. First, as in print, we saw marked declines in the amount of anonymous sourcing on the Internet this year. Overall, 19% of the stories cited anonymous sources, down from 39% a year earlier.

The next question is how transparent the sourcing online is, or the degree to which sites offer information about sources that enables audiences to decide what they think of the information for themselves. Just slightly less than half of all stories (46%) reached the highest level of source transparency, at least four sources with a clear attempt to explain enough about the sources’ knowledge, expertise and potential biases. There was little difference here among the sites.

This is less than what we saw on newspaper front-pages but substantially greater transparency than in network evening or morning news or cable.

Source Transparency, Online News vs. Other Outlets

Online
NP
Net. Eve.
PBS
Morning
No Sources
7%
7%
37%
36%
39%
1 Source
16
12
14
21
23
2-3 Sources
32
33
32
20
28
4+ Sources
46
48
17
23
11

Story Depth

The next question about sourcing is how many sides of the story are included. At least in the lead stories studied, the Internet stood out among the media studied for getting multiple viewpoints. Fully 85% of stories contained a mix of views. In addition, we counted how many different types of interest groups or stakeholders were cited, a slightly different way of looking at breadth of sourcing. Online, the majority of stories (56%) had at least four different stakeholder groups, and another 21% had three or more stakeholders.

That depth of sourcing exceeds even that found in newspapers, where 76% of stories offered a mix of opinions (82% of front-page stories) and 39% contained four or more stakeholders. No other medium studied even came close to reaching these levels.

Not only was this level of sourcing high among media studied, it was also fairly consistent across the sites examined.

Journalist Opinion

To critics, the Internet is sometimes dismissed as a medium dominated by subjective opinion. How true is that of the major news sites?

The online content studied was largely free of unattributed journalistic opinion. Fewer than one in ten of the lead stories (7%) contained opinion from journalists. That mirrors the finding for newspaper front pages (6%), though if we include metro and sports sections, the figure more than doubles to 15%.

Looking at the sites individually, MSNBC was the most likely to include journalist opinion, though still to no great degree – 19% of stories studied. The Bloomington Illinois Pantagraph was the least likely, with just 1%.

In the end, online stories were more likely than most other media to score well on what we call the “Reporting Index.”

To be included stories had to meet the following conditions:

1. Four or more transparent sources
2. A mix of viewpoints
3. Four or more stakeholders

More than a quarter (26%) of the online stories studied reached the highest levels of depth and transparency. That is four or more fully identified sources, four or more stakeholders, and a mix of views.

The percentage is slightly less than for newspaper front pages, where 33% of all stories met the criteria. But it is eight percentage points above newspapers over all. On network evening news, 10% of the stories followed this model.

Lead Story Topics

When it came to the topic agenda on Web sites – what stories they led with on their front pages – the Web looks a good deal like the front pages of major newspapers, and much more traditional than either cable or broadcast television news.

Government made up a third of the stories (33%), followed by other domestic issues (24%), foreign affairs (16%), the campaign (8%) and disasters/accidents and weather catastrophes. The numbers mirror almost exactly the figures for the front pages of major national newspapers, except for the focus on disasters and accidents.

Inside those numbers are some interesting wrinkles, given the youthful nature of the Internet audience and the potential for addressing more issues on the Web. While domestic affairs was the second most popular topic, for instance, that broad category heading is somewhat deceptive. A big majority of the lead domestic-affairs stories (79%) concerned terrorism. No other single area of domestic news – public health, education, the environment, transportation, sprawl and health care – reached even a single percentage point.

Story Length

With an online environment, one can make conflicting arguments about how long stories should be. The Internet has the potential for infinite depth. Yet some people believe the computer screen is better suited to shorter stories than long scrolls of text or multiple clicks to page through a story.

Is there an optimal length? Do sites vary much?

Over all, the lead stories on the Web were similar in length to newspaper front-page articles. Roughly half of the Internet lead stories were between 500 and 1,000 words (48%) as were 44% of front-page newspaper stories. Another 32% were over 1,000 words. This was true of 40% of newspaper front-page articles.

Footnotes

1. The 2003 analysis was based on four downloads a day for five days, while 2004 was one download a day for twenty days plus four downloads five of those twenty days.

2. The number of completely new stories drops if you discount the 9 a.m. stories, when most sites start fresh, to 48%, but that is still up markedly from 34% a year earlier.

Click here to view content data tables.