Scaling considerations

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Scaling considerations

Tobias Hunger
Hi!

> It struck me as odd that I couldn't find a discussion of the scaling
> issues involved. Maybe I didn't look hard enough, maybe much of the
> discussion takes place on irc or perhaps you have defined the problem
> away :)

Yeap, there could be more information on the project homepage about this
(and any other issue:-)

> In any case, I'm concerned about file sharing between friends and what
> would happen if I was running owncloud on my home box (or even some server
> somewhere) and some of my files became too popular for my connection. As
> things are, I'd stand a good chance of getting (accidentally and without
> warning) DDoS'ed out of the internet.

Is that really that big a problem I wonder? You are inviting people to view
your stuff after all, so you can just ask them to stop again:-)

This does look a bit different if you host contents that is
visible for everybody... but then running owncloud is no different
then running any other web server on a server at your home. Just don't
get anything hosted there mentioned on slashdot:-)

> Secondly, I would consider it very important to allow for mobility.

Moving your data is hard with a web-server based approach. Basically you need
to move the data and then ask everybody to update their URLs.

A more "cloudy" approach would include things like encrypting data, applying
a global unique Id to each piece of data and uploading it into some form of
storage network. P2P comes into this, too, considering ownClouds "everybody
can run its own server" idea...

There are proposals on encrypted storage in the owncloud wiki, covering (parts
of) this. Please comment on them!

Best Regards,
Tobias

Reply | Threaded
Open this post in threaded view
|

Scaling considerations

Aggelos Economopoulos
Am 29/06/2010 08:15 ??, schrieb Tobias Hunger:
> Hi!
>> It struck me as odd that I couldn't find a discussion of the scaling
>> issues involved. Maybe I didn't look hard enough, maybe much of the
>> discussion takes place on irc or perhaps you have defined the problem
>> away :)
>
> Yeap, there could be more information on the project homepage about this
> (and any other issue:-)

Yes, definitely :) But discussing the issues is a necessary first step
in that direction.

>> In any case, I'm concerned about file sharing between friends and what
>> would happen if I was running owncloud on my home box (or even some server
>> somewhere) and some of my files became too popular for my connection. As
>> things are, I'd stand a good chance of getting (accidentally and without
>> warning) DDoS'ed out of the internet.
>
> Is that really that big a problem I wonder? You are inviting people to view
> your stuff after all, so you can just ask them to stop again:-)

Well, I invite 5 people to take a look at this file, but these people
find it so interesting that they point their friends to it as well. One
of those people posts the link on some forum somewhere. Suddenly I get a
SYN storm from people that I don't know and of course can't contact
except by responding with a static error page (if that :)

> This does look a bit different if you host contents that is
> visible for everybody... but then running owncloud is no different
> then running any other web server on a server at your home. Just don't
> get anything hosted there mentioned on slashdot:-)

Yes, true. But since I am building up trust relationships as part of
owncloud, this is a huge opportunity to make use of those relationships.
E.g. my well-connected owncloud instance could automatically offer to
mirror popular files of my immediate and/or trusted friends (or even
files of their friends etc). This could be push-based. I am not claiming
that this is the best approach and I'd be very interested in other
suggestions.

You're of course absolutely right that one can define the problem away.
However, I think that you shouldn't. After all, you're competing against
proprietary systems that offer virtually unlimited bandwidth (and mostly
sufficient amounts of space and computation power) for "free". But more
importantly, such an architecture connects the ability to reach a lot of
people with the financial ability to pay for the resources (see below).
You could argue that the existing approach of finding somebody who wants
to "sponsor" your ideas/exression/whatever and can provide the resources
is good enough, but IME there are issues. The most important one being
that when later there is a clash of ideas, it is kinda hard to move your
net presence elsewhere.

>> Secondly, I would consider it very important to allow for mobility.
>
> Moving your data is hard with a web-server based approach. Basically you need
> to move the data and then ask everybody to update their URLs.

Depends on the scenario. If I'm changing hosting providers (with my home
machine potentially being one of the providers), I could just leave
behind a (machine interpreted) redirect and people would lazily update
to my new location url. But say my account has been shut down by the
hosting provider (e.g. because of some policy rules). Then I could use
my private key and a backup of my friends list/roster to credibly notify
them of the url change (you could get more clever here).

> A more "cloudy" approach would include things like encrypting data, applying
> a global unique Id to each piece of data and uploading it into some form of
> storage network. P2P comes into this, too, considering ownClouds "everybody
> can run its own server" idea...

Well, I don't know about encrypting... my concern was mostly about
public files. If some files are encrypted for a group of people, I think
it's highly unlikely that the group is large enough and the files are
*that* big that computing/networking resources are going to be an issue.

> There are proposals on encrypted storage in the owncloud wiki, covering (parts
> of) this. Please comment on them!

We can all start proposing our favorite architectures, but IMHO that's a
bit backwards. The more immediate questions have more to do with what we
want out of the system.

Do we want it to scale without requiring the user to throw resources at
the problem?
Do we want the identity to be in the hands of the user (so that a
malicius hosting provider can only temporarily inconvenience its users?)?
If we end up using keypairs, how do we handle key management (that's
probably the most interesting question)?
Do we care about forgery?
Do we want to enable end-to-end crypto between arbitrary users or
between "friends"?
Do we want to allow multiple master copies that can get out of sync?

Personally, I'd want a system where I don't have to pay for resources in
proportion to my popularity -- if I did, this would be a big incentive
to try to monetize my popularity. I don't think I need to remind anyone
interested in owncloud that system architecture is not apolitical.

In addition, I'd like to be able to use a friend's server w/o trusting
them with my identity. Even more so, I'd like to be able to provide
hosting for friends' internet presence _without_ them having to trust me
absolutely. This does not necessarily mean that all data on the server
should be encrypted.

Not all data is equally valuable anyway and I'd like to be able to
access some of it from a machine I don't trust.

As for securely exchanging data between users, I think it's something
worth doing but I don't see that as a priority. The social problems with
pushing adoption of solutions like owncloud are big enough as it is.
/Requiring/ encrypted data exchange is not realistic at this point IMHO.
Allowing for it is a worthy goal though.

Diverging versions of the data is another "interesting" problem. Not
sure how important that is.

My point is, we need to be clear about the requirements before we
discuss specific architecture and implementation issues.

That said, my approach to encrypted file storage would be much closer to
the first proposal (sorry Tobias :)

So, what requirements do people consider necessary? Keep in mind the
obvious tradeoff between usefulness/coolness of features on the one hand
and the implementation effort and social acceptance/usability issues on
the other.

Aggelos


Reply | Threaded
Open this post in threaded view
|

Scaling considerations

Aggelos Economopoulos
In reply to this post by Tobias Hunger
Am 29/06/2010 08:15 ??, schrieb Tobias Hunger:
[...]
> There are proposals on encrypted storage in the owncloud wiki, covering (parts
> of) this. Please comment on them!

OK, some thoughts on the first encrypted file storage proposal.

First wrt key management. I think there should be different key pairs
for different sets of data, so that the damage is contained when one
secret key is compromised.

One of my requirements would be to be able to access (pre-designated)
parts of my data from machines that I don't necessarily trust. In that
case, if the value of the data is such that I'd be willing to risk
someone getting access to my key in order to access that data *now*, I
could make that tradeoff more easily. In addition, the server could
enforce stricter limitations on the kind of access that I'd have (say,
on access time, volume of data transferred, allowed operations etc).

In addition you would be able to specify different expiration times for
the keys (I'm assuming you're also using the keypair for authentication
purposes so that when it expires one wouldn't have access to the data
without breaking into the server;). This is all in order to minimize the
value of the key (which, given the nature of the data one would
typically store in an owncloud instance, is already very low to begin with).

On a tangent, the reason we're using encryption here is to defend
against the server administrator and random people who might break into
the server. It is possible that they would be specifically after the
owncloud data (especially someone breaking in a large hosting provider),
but our threat model is not a determined opponent who is targeting any
individual user; there are way more cost-effective ways to get to the
data of an individual.

If one trusts the server enough to store their data unencrypted, it is
easy enough to also use one-time passwords (again, with additional
limitations) for access from untrusted machines.

I've argued in a previous mail that it is important that the user
identity is connected with a keypair instead of a URL to allow for
(semi-)transparent migration to different servers. This keypair would be
the master keypair with which the other keypairs are signed.

Now, for that keypair, you really want it to only be used from a trusted
client. Writing a standalone client is conceptually straightforward, but
involves some administrative overhead. I'd definitely want to use one,
but a complementary approach might be to make use of a javascript-heavy
webpage to provide a cross-platform client that is always available, has
little dependency issues and is always automatically upgraded to the
latest version. This would of course require the user to be able to
verify that the code the server sent them actually corresponds to the
latest version of the publically-reviewed code. Now, the best way I can
think of to do that is to have the (trusted) browser display a hash of
all the page content and publish that hash on many places on the net. I
started searching around a bit and it seems that (naturally) somebody
has started exploring that direction, including a prototype of a firefox
extension: http://corte.si/posts/security/crypsr-evolution.html (see the
previous blog posts as well).

Now, as for the issues of (public, which also implies unencrypted)
object propagation, it is seems to me that the storage model would be
very similar to the git content repository, right up to using signed
tags. In fact, one could probably argue that git is 90% there already
and could be used as is, possibly with the addition of a simple layer on
top of blobs for meta-information.

I don't see propagation of encrypted objects as something desireable at
this point; it would be useful if you had large, closed (invitation only
or centrally managed) user groups (say, for collaboration) but there you
have much more important issues with diverging versions of objects that
we'd have to talk about first. My approach would probably be to only act
as a rendezvous point for application-specific protocols.

Anyway, most of you are probably at Akademy and these are more notes to
myself than a complete proposal, but I'd be happy to read any comments
that you might have.

Thanks,
Aggelos